Introduction

What if you could run a powerful AI model on your laptop — no internet, no subscriptions, no data leaving your machine? That's exactly what Ollama makes possible. Here's how I set it up on Apple M-Series chips.

What is Ollama?

Ollama is an open-source tool that lets you download and run large language models (LLMs) directly on your computer. No cloud. No API keys. No usage limits. It handles all the complex setup — model downloading, GPU acceleration, and inference — behind a dead-simple command-line interface.

On Apple Silicon Macs, Ollama now runs on Apple's MLX framework, which means it uses the Neural Engine and unified memory architecture for seriously fast performance — especially on M4 and M5 chips.

100%
PRIVATE

Your prompts never leave your machine. Zero telemetry.

$0
COST

Free forever. No token billing, no subscriptions.

USAGE

No rate limits. Run it all day, all night.

Offline
WORKS ANYWHERE

Planes, remote areas, secure environments — no WiFi needed.

Installation

Getting Ollama running takes about two minutes. Go to ollama.com/download, grab the macOS build, drag it to Applications, and open it. You'll see a llama icon appear in your menu bar — that means the server is running.

Then open Terminal and pull your first model:

ollama run gemma4:e4b

It downloads the model (around 9.6GB for the 4B version) and drops you straight into a chat. That's it. No config files, no Python environments, no Docker.

Best Models for Apple M-Series

The right model depends entirely on how much RAM your Mac has. Every M-series chip — from M1 to M5 — runs these well.

MODEL RAM REQUIRED BEST FOR SPEED
llama3.2:3b 8GB+ Quick chats, fast answers Instant
llama3.3:8b 16GB+ General use, writing, Q&A Fast
qwen2.5-coder:7b 16GB+ Coding — best for dev work Fast
qwen3:14b 24GB+ Coding + reasoning, multilingual Moderate
gemma4:e4b 16GB+ Everyday tasks, well-rounded Fast
gemma4:12b 32GB+ Complex reasoning, long context Moderate

Add a Chat UI

Chatting in Terminal works, but if you want a proper interface — something that looks like ChatGPT — install Open WebUI. It runs in your browser and connects directly to your local Ollama instance.

pip install open-webui
open-webui serve

Then open localhost:8080 in Safari. Full chat UI, conversation history, model switching — all local.

Is it as good as ChatGPT or Claude?

Honestly? Not quite — at least not at the 8B parameter scale. The frontier models from Anthropic and OpenAI are significantly more capable for complex reasoning, nuanced writing, and long-context tasks. But for day-to-day things like drafting emails, summarising text, answering questions, or helping with code? A local 8B model gets surprisingly close.

"Claude and ChatGPT for anything important or complex. Ollama for quick, private, offline, and unlimited tasks. They're complementary, not competing."

Final Thoughts

Running AI locally felt like a power-user thing a year ago. Now, thanks to Ollama and Apple Silicon, it's genuinely accessible — and fast enough to be part of your daily workflow. If you've got an M-series Mac, there's no reason not to have it running alongside your cloud AI tools.