Which pillar does this cluster support?

See /blog/lm-studio-vs-ollama-vs-llama-cpp-which-local-ai-tool for the full cornerstone guide.

LM Studio: Download Models Step by Step | WikiWayne

Downloading models in LM Studio is mostly point-and-click: open the Discover tab, search a model name, and pick a quant from the dropdown. The only real trick is reading the quant labels so you don't grab a file that won't fit in your memory. This guide walks the whole flow start to finish, decodes the GGUF and quant naming, and gives you a sizing rule so you stop guessing.

If you're still deciding between tools before you commit, my LM Studio vs Ollama vs llama.cpp comparison is the parent guide this one feeds into. Come back here once you've settled on LM Studio.

What is LM Studio and what format does it download?

LM Studio is a free desktop app (Windows, macOS, Linux) that gives you a GUI catalog for finding, downloading, and chatting with open-weight LLMs locally. Under the hood it runs GGUF files through a bundled llama.cpp engine, plus an MLX engine on Apple Silicon for native Metal acceleration.

GGUF (GPT-Generated Unified Format) is the single-file container format that packs a model's weights, tokenizer, and metadata together so a runner can load it without any extra config. Almost every model you'll download in LM Studio is a GGUF, and the quant label baked into the filename is the one thing you actually have to understand. More on the format in my GGUF explainer.

How do I download a model in LM Studio, step by step?

Here's the full flow, assuming a fresh install.

Open the Discover tab. It's the magnifying-glass icon in the left sidebar (older builds call it the Search tab).
Search a model name. Type something concrete like Qwen3 8B, Llama 3.1 8B Instruct, Gemma 3 12B, or Mistral Nemo. LM Studio queries Hugging Face and lists matching repos.
Pick the repo, then expand the quant list. Each result has a dropdown of available quantizations. This is where people freeze up, so I'll decode the labels in the next section.
Watch the fit indicator. LM Studio tags each quant with a rough "will this fit your machine" estimate based on your detected RAM/VRAM. Green-ish means it should load; a warning means you're cutting it close or over.
Click Download. Files land in ~/.lmstudio/models/ (or your configured models folder) in a publisher/model/ folder structure.
Load it from the Chat tab. Hit the model selector at the top, pick what you just downloaded, and adjust the GPU offload slider before loading.
Test with one prompt. Confirm it responds and check tokens/sec in the bottom bar.

That's it. The download itself is trivial; everything that goes wrong is a quant-and-memory mismatch, which the rest of this guide is about.

How do I read LM Studio's quant labels without guessing?

Quantization is the process of storing model weights at lower numeric precision (4-bit, 5-bit, 8-bit instead of 16-bit) to shrink the file and cut memory use, at a small cost to quality. The label tells you exactly which scheme a file uses.

Here's how to parse a typical filename like Qwen3-8B-Q4_K_M.gguf:

The number (Q4, Q5, Q8) is the bits per weight. Lower = smaller file, less memory, slightly lower quality.
_K means a "K-quant," the modern mixed-precision method that's better than the old legacy quants at the same size. You want K-quants.
The suffix (_S, _M, _L) is Small / Medium / Large within that bit level — a finer trade between size and quality.

Quant cheat sheet for the labels you'll actually see:

Quant label	Bits/weight	Quality	When I reach for it
`Q8_0`	8	Near-lossless	Plenty of memory, want max quality
`Q6_K`	~6.5	Excellent	Sweet spot when you have headroom
`Q5_K_M`	~5.5	Very good	Balanced default if Q4 feels lossy
`Q4_K_M`	~4.5	Good	My default — best size/quality balance
`Q4_K_S`	~4	Decent	Squeezing into tight memory
`Q3_K_M`	~3.5	Noticeably degraded	Last resort to make it fit at all
`IQ` (e.g. `IQ4_XS`)	varies	Good per-byte	Tightest fits; uses imatrix, slightly slower

If you only remember one thing: Q4_K_M is the right default for almost everyone. Start there, and only move up to Q5/Q6/Q8 if you have memory to spare or down to Q3/IQ if it won't load. I go deeper on the trade-off in Q4 vs Q8: the quant quality tradeoff and the broader quantization explainer.

How much memory does a given quant need?

Quick mental math: file size on disk roughly equals the RAM/VRAM the weights need, plus a bit of overhead for context (the KV cache). So a 5 GB GGUF wants about 5 GB of memory for weights, then add maybe 0.5-2 GB for context depending on how long you set it.

Ballpark download sizes by model size at Q4_K_M (verify the exact bytes in LM Studio's dropdown, these drift by model):

Model size	~Q4_K_M file	~Memory to run comfortably
3-4B	~2-3 GB	6 GB+
7-8B	~4-5 GB	8 GB+
12-14B	~7-9 GB	12-16 GB
27-32B	~16-20 GB	24-32 GB
70B	~40 GB	48 GB+ (or heavy CPU offload)

These are rough. Treat LM Studio's own fit estimate as a starting hint and confirm on your own stack — actual usage shifts with context length and engine. For the full math, see how much VRAM for Llama 3 8B and the general VRAM requirements guide.

Which model and quant should I pick? (decision list)

Match the choice to your hardware:

If you have 8 GB RAM/VRAM → grab a 7-8B model (Qwen3 8B, Llama 3.1 8B) at Q4_K_M. Keep context modest.
If you have 16 GB → run an 8B at Q5_K_M/Q6_K, or step up to a 12-14B (Gemma 3 12B, Phi-4) at Q4_K_M.
If you have 24 GB (e.g. a 3090/4090 or M-series with unified memory) → a 27-32B model (Qwen3 32B, Gemma 3 27B) at Q4_K_M is a great daily driver.
If you have 32 GB+ of unified memory on Apple Silicon → prefer the MLX version of the model when LM Studio offers it; it's tuned for Metal and often runs a touch faster than the GGUF.
If you're on a small or CPU-only box → stay at 3-4B and accept slower tokens/sec. See my CPU-only privacy tradeoff piece.
If it won't fit at Q4_K_M → drop to Q4_K_S, then an IQ4_XS, then a smaller model. Don't go below Q3 unless you have no other option.

How do I import a GGUF I already downloaded?

If you already pulled a GGUF elsewhere (Hugging Face directly, or shared from another machine), you don't have to re-download. Drop the file into your models directory using the publisher/model folder convention:

# macOS / Linux default models path
~/.lmstudio/models/<publisher>/<model-name>/<file>.gguf

# Example
mkdir -p ~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF
cp ~/Downloads/Qwen3-8B-Q4_K_M.gguf \
   ~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF/

Restart LM Studio (or refresh My Models) and it'll appear in the Chat selector. The same GGUF works in Ollama and llama.cpp too, so you're not locked in — that portability is the whole point of the format.

Why is my model slow or failing to load?

A few quick fixes for the common ones:

Loads then crashes / out of memory → the quant is too big. Pick a smaller quant or lower the GPU offload layers.
Painfully slow → too many layers are running on CPU. Bump the GPU offload slider up in the load settings; see GPU offload layers explained.
Won't fit even at Q4 → you're memory-bound. Smaller model, or look at the best budget GPU options.
Context errors on long prompts → lower the context length slider; long context eats memory fast on top of the weights.

Bottom line

Downloading models in LM Studio is easy once the quant labels stop being a mystery. Search the Discover tab, default to Q4_K_M, check that the file size fits your memory with a little headroom, and only deviate up or down when you have a reason. On Apple Silicon, grab the MLX build when it's offered. Start small, confirm it loads and responds, then scale up — and when you're ready to compare runners head-to-head, head back to the pillar guide.

If you're still deciding between tools before you commit, my LM Studio vs Ollama vs llama.cpp comparison is the parent guide this one feeds into. Come back here once you've settled on LM Studio.

What is LM Studio and what format does it download?

How do I download a model in LM Studio, step by step?

Here's the full flow, assuming a fresh install.

Open the Discover tab. It's the magnifying-glass icon in the left sidebar (older builds call it the Search tab).
Search a model name. Type something concrete like Qwen3 8B, Llama 3.1 8B Instruct, Gemma 3 12B, or Mistral Nemo. LM Studio queries Hugging Face and lists matching repos.
Pick the repo, then expand the quant list. Each result has a dropdown of available quantizations. This is where people freeze up, so I'll decode the labels in the next section.
Watch the fit indicator. LM Studio tags each quant with a rough "will this fit your machine" estimate based on your detected RAM/VRAM. Green-ish means it should load; a warning means you're cutting it close or over.
Click Download. Files land in ~/.lmstudio/models/ (or your configured models folder) in a publisher/model/ folder structure.
Load it from the Chat tab. Hit the model selector at the top, pick what you just downloaded, and adjust the GPU offload slider before loading.
Test with one prompt. Confirm it responds and check tokens/sec in the bottom bar.

That's it. The download itself is trivial; everything that goes wrong is a quant-and-memory mismatch, which the rest of this guide is about.

How do I read LM Studio's quant labels without guessing?

Here's how to parse a typical filename like Qwen3-8B-Q4_K_M.gguf:

The number (Q4, Q5, Q8) is the bits per weight. Lower = smaller file, less memory, slightly lower quality.
_K means a "K-quant," the modern mixed-precision method that's better than the old legacy quants at the same size. You want K-quants.
The suffix (_S, _M, _L) is Small / Medium / Large within that bit level — a finer trade between size and quality.

Quant cheat sheet for the labels you'll actually see:

Quant label	Bits/weight	Quality	When I reach for it
`Q8_0`	8	Near-lossless	Plenty of memory, want max quality
`Q6_K`	~6.5	Excellent	Sweet spot when you have headroom
`Q5_K_M`	~5.5	Very good	Balanced default if Q4 feels lossy
`Q4_K_M`	~4.5	Good	My default — best size/quality balance
`Q4_K_S`	~4	Decent	Squeezing into tight memory
`Q3_K_M`	~3.5	Noticeably degraded	Last resort to make it fit at all
`IQ` (e.g. `IQ4_XS`)	varies	Good per-byte	Tightest fits; uses imatrix, slightly slower

How much memory does a given quant need?

Ballpark download sizes by model size at Q4_K_M (verify the exact bytes in LM Studio's dropdown, these drift by model):

Model size	~Q4_K_M file	~Memory to run comfortably
3-4B	~2-3 GB	6 GB+
7-8B	~4-5 GB	8 GB+
12-14B	~7-9 GB	12-16 GB
27-32B	~16-20 GB	24-32 GB
70B	~40 GB	48 GB+ (or heavy CPU offload)

Which model and quant should I pick? (decision list)

Match the choice to your hardware:

If you have 8 GB RAM/VRAM → grab a 7-8B model (Qwen3 8B, Llama 3.1 8B) at Q4_K_M. Keep context modest.
If you have 16 GB → run an 8B at Q5_K_M/Q6_K, or step up to a 12-14B (Gemma 3 12B, Phi-4) at Q4_K_M.
If you have 24 GB (e.g. a 3090/4090 or M-series with unified memory) → a 27-32B model (Qwen3 32B, Gemma 3 27B) at Q4_K_M is a great daily driver.
If you have 32 GB+ of unified memory on Apple Silicon → prefer the MLX version of the model when LM Studio offers it; it's tuned for Metal and often runs a touch faster than the GGUF.
If you're on a small or CPU-only box → stay at 3-4B and accept slower tokens/sec. See my CPU-only privacy tradeoff piece.
If it won't fit at Q4_K_M → drop to Q4_K_S, then an IQ4_XS, then a smaller model. Don't go below Q3 unless you have no other option.

How do I import a GGUF I already downloaded?

# macOS / Linux default models path
~/.lmstudio/models/<publisher>/<model-name>/<file>.gguf

# Example
mkdir -p ~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF
cp ~/Downloads/Qwen3-8B-Q4_K_M.gguf \
   ~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF/

Why is my model slow or failing to load?

A few quick fixes for the common ones:

Loads then crashes / out of memory → the quant is too big. Pick a smaller quant or lower the GPU offload layers.
Painfully slow → too many layers are running on CPU. Bump the GPU offload slider up in the load settings; see GPU offload layers explained.
Won't fit even at Q4 → you're memory-bound. Smaller model, or look at the best budget GPU options.
Context errors on long prompts → lower the context length slider; long context eats memory fast on top of the weights.

LM Studio: Download Models Step by Step

Key takeaways

What is LM Studio and what format does it download?

How do I download a model in LM Studio, step by step?

How do I read LM Studio's quant labels without guessing?

How much memory does a given quant need?

Which model and quant should I pick? (decision list)

How do I import a GGUF I already downloaded?

Why is my model slow or failing to load?

Bottom line

Frequently asked questions

Related Articles

LM Studio vs Ollama vs llama.cpp: Which Local AI Tool?

llama.cpp vs Ollama: When to Switch

Best Used GPUs for Local AI on a Budget (2026)

LM Studio: Download Models Step by Step

Key takeaways

What is LM Studio and what format does it download?

How do I download a model in LM Studio, step by step?

How do I read LM Studio's quant labels without guessing?

How much memory does a given quant need?

Which model and quant should I pick? (decision list)

How do I import a GGUF I already downloaded?

Why is my model slow or failing to load?

Bottom line

Frequently asked questions

Related Articles

LM Studio vs Ollama vs llama.cpp: Which Local AI Tool?

llama.cpp vs Ollama: When to Switch

Best Used GPUs for Local AI on a Budget (2026)