OllamaLocal AIZed Editor

Run a Local AI Model on Your Mac & Never Worry About API Costs

May 20, 20265 min read

If you've been experimenting with AI tools to run your consulting practice, you've probably noticed the meter running. Every prompt you send to Claude or GPT-4 costs something. It adds up fast, especially when you're using AI as a daily thinking partner.

There's a different way to work. You can run a capable AI model directly on your Mac, for free, with no internet required. This guide sets up Qwen3 14B through a tool called Ollama, then wires it into Zed, a fast modern code and text editor. Once it's done, you have a private, always-available AI assistant that works even on a plane.

What You're Actually Installing

Two things, working together.

Ollama is like a local server manager for AI models. Think of it as the engine room. It runs quietly in the background, hosts the model, and listens for requests on a specific address on your computer.

Qwen3 14B is the actual AI model. It's made by Alibaba's research team, it's genuinely capable, and it's free to run locally. The "14B" refers to 14 billion parameters, which is roughly how the model's intelligence is measured. It's not the biggest model in the world, but it's more than enough for everyday thinking work.

Before You Start

You'll need a Mac with enough memory to run this. Qwen3 14B works best with 16GB RAM or more. You'll also need Homebrew installed. Homebrew is a package manager for macOS, which just means it's a tool that installs other software from the command line. If you don't have it, head to brew.sh and follow the one-line install at the top of the page.

You'll also need Zed installed. You can grab it at zed.dev. It's free.

Step 1: Install Ollama

Open your Terminal app (search for it in Spotlight with Cmd+Space) and paste this in:

Script 1

brew install ollama

Hit Enter and let it run. Homebrew will download and install Ollama automatically. When you see your prompt return, you're done with this step.

Step 2: Start the Ollama Server

brew services start ollama

This starts Ollama as a background service on your Mac. It will automatically restart whenever you reboot, so you won't need to babysit it. It runs on a local address, http://localhost:11434, which is basically a door on your own computer that nothing outside can access.

Step 3: Download the Model

Script 2

ollama pull qwen3:14b

This downloads the Qwen3 14B model to your machine. It's a few gigabytes, so give it a minute depending on your connection. Wait until you see "success" in the terminal before moving on.

Step 4: Check That Everything Is Running

Run these two commands to confirm nothing went sideways:

Script 3

# Confirm server is running
curl http://localhost:11434

# Confirm model is available
curl -s http://localhost:11434/api/tags | python3 -m json.tool

The first one should return a short response confirming Ollama is alive. The second one prints a list of models you've downloaded. Look for qwen3:14b in the output. If it's there, you're in good shape.

Step 5: Connect It to Zed

This is where you plug the model into your editor. Zed uses a settings file to configure everything, including which AI model to talk to.

Open Zed, then press Cmd+Shift+P and type "open settings". Select "zed: open settings" from the dropdown. This opens a file called settings.json. You can also find it manually at:

~/Library/Application Support/Zed/settings.json

Add this block to the file. If you already have a language_models or agent section, you'll need to merge carefully. If the file is mostly empty, paste this in:

Script 4

"language_models": {
  "ollama": {
    "api_url": "http://localhost:11434",
    "available_models": [
      {
        "name": "qwen3:14b",
        "display_name": "Qwen3 14B (Mac Ollama)",
        "max_tokens": 32768,
        "supports_tools": true
      }
    ]
  }
},
"agent": {
  "default_model": {
    "provider": "ollama",
    "model": "qwen3:14b"
  },
},

Save the file. You're telling Zed: "When I open the AI panel, talk to the model running on my own machine instead of sending anything to the cloud."

Step 6: Test It in Zed

Press Cmd+Shift+A to open the Agent Panel in Zed. Look at the model dropdown in the panel. It should show "Qwen3 14B (Mac Ollama)".

Switch to "Write" mode and type a test prompt, something like "Summarize what you can do for a solo consultant." If you get a response, everything is working.

Controlling the Server

You don't need to manage this constantly, but here are the commands when you need them:

Script 5

brew services start ollama    # Start server (auto-starts on login)
brew services stop ollama     # Stop server and free port 11434
ollama stop qwen3:14b         # Unload model from memory only

The third one is useful if the model is loaded into memory and you want to free up RAM without stopping the whole server.

If You Have a Second Machine

This guide is specifically for running Qwen3 14B locally on your Mac. If you also have a more powerful Windows machine available, there's a separate setup for running a larger model (Qwen3.6-35B-A3B) on that machine via something called llama.cpp, then accessing it over a network tunnel from your Mac.

That setup gives you more horsepower when you need it. The Mac setup in this guide is your fallback when that machine is off or unavailable.

Switching between them is just a matter of updating one section of your settings.json and making sure the right server is running. The source doc covers both configurations in full. [NEEDS CLARIFICATION: link to the Windows/llama.cpp companion guide if one exists]

What You Just Built

You now have a private AI assistant running entirely on your Mac. No API keys. No monthly bill for prompts. No data leaving your machine.

If you're building out a more intentional setup for your consulting practice, tools like Qaibits are designed to sit on top of workflows like this. The idea is to help you systematize how you run your business, not just how you chat with AI. Worth exploring once your local model is humming along.