Run AI on your own Mac. What hardware runs what, and when it's worth it vs the cloud.
32 GB
Sweet-spot RAM
$0
Per-Inference Cost
3
Tools to Know
10 min
Setup Time
Hardware tier table (Apple Silicon)
8 GB RAM
2-3B models · Gemma 2B, Phi-3 Mini, Llama 3.2 3B
Light
16 GB
~8B comfortably · Llama 3.3 8B, Gemma 9B, Qwen 7B
Practical
SWEET SPOT
32 GB
14B-24B · Mistral Small 24B, Qwen 14B, Gemma 27B
64 GB+
70B at Q4 · Llama 3.3 70B, DeepSeek-V3 (sharded)
Power
The 3 tools
START HERE
Ollama
CLI, one command to install, models pull on demand. ollama run gemma2:9b
LM Studio
Friendly GUI desktop app. Browse + chat. Best for non-CLI users.
llama.cpp
The C++ engine under Ollama + LM Studio. Use directly for max control.
Pro Tip
Ollama exposes an OpenAI-compatible API at localhost:11434. Code that calls OpenAI works against Ollama with one URL change.
Honest Limits
Best local model trails frontier (Opus / GPT-5 / Gemini Ultra) on hard reasoning. Speed is 5-15 tok/sec vs 100+ in cloud. Context maxes at 32-128K vs 1M.
When local actually wins
10-minute Ollama setup
Privacy-sensitive material Drafts that shouldn't be logged
Offline use Plane, café with bad Wi-Fi
Batch / automation loops Per-call cost would add up
Embeddings + RAG Small embedding models = blazing fast