Introduction
What if you could run a powerful AI model on your laptop — no internet, no subscriptions, no data leaving your machine? That's exactly what Ollama makes possible. Here's how I set it up on Apple M-Series chips.
What is Ollama?
Ollama is an open-source tool that lets you download and run large language models (LLMs) directly on your computer. No cloud. No API keys. No usage limits. It handles all the complex setup — model downloading, GPU acceleration, and inference — behind a dead-simple command-line interface.
On Apple Silicon Macs, Ollama now runs on Apple's MLX framework, which means it uses the Neural Engine and unified memory architecture for seriously fast performance — especially on M4 and M5 chips.
Your prompts never leave your machine. Zero telemetry.
Free forever. No token billing, no subscriptions.
No rate limits. Run it all day, all night.
Planes, remote areas, secure environments — no WiFi needed.
Installation
Getting Ollama running takes about two minutes. Go to ollama.com/download, grab the macOS build, drag it to Applications, and open it. You'll see a llama icon appear in your menu bar — that means the server is running.
Then open Terminal and pull your first model:
ollama run gemma4:e4b
It downloads the model (around 9.6GB for the 4B version) and drops you straight into a chat. That's it. No config files, no Python environments, no Docker.
Best Models for Apple M-Series
The right model depends entirely on how much RAM your Mac has. Every M-series chip — from M1 to M5 — runs these well.
| MODEL | RAM REQUIRED | BEST FOR | SPEED |
|---|---|---|---|
| llama3.2:3b | 8GB+ | Quick chats, fast answers | Instant |
| llama3.3:8b | 16GB+ | General use, writing, Q&A | Fast |
| qwen2.5-coder:7b | 16GB+ | Coding — best for dev work | Fast |
| qwen3:14b | 24GB+ | Coding + reasoning, multilingual | Moderate |
| gemma4:e4b | 16GB+ | Everyday tasks, well-rounded | Fast |
| gemma4:12b | 32GB+ | Complex reasoning, long context | Moderate |
Add a Chat UI
Chatting in Terminal works, but if you want a proper interface — something that looks like ChatGPT — install Open WebUI. It runs in your browser and connects directly to your local Ollama instance.
pip install open-webui
open-webui serve
Then open localhost:8080 in Safari. Full chat UI, conversation history, model switching — all local.
Is it as good as ChatGPT or Claude?
Honestly? Not quite — at least not at the 8B parameter scale. The frontier models from Anthropic and OpenAI are significantly more capable for complex reasoning, nuanced writing, and long-context tasks. But for day-to-day things like drafting emails, summarising text, answering questions, or helping with code? A local 8B model gets surprisingly close.
"Claude and ChatGPT for anything important or complex. Ollama for quick, private, offline, and unlimited tasks. They're complementary, not competing."
Final Thoughts
Running AI locally felt like a power-user thing a year ago. Now, thanks to Ollama and Apple Silicon, it's genuinely accessible — and fast enough to be part of your daily workflow. If you've got an M-series Mac, there's no reason not to have it running alongside your cloud AI tools.