Build & Defend Newsletter 12

One email a week - something from which I hope you'll get real value. We talk about things we can build, and how to defend them. That can apply to cybersecurity, physical buildings, digital products, and .... just about anything. It gives me a lot of latitude in what I can write about, but the two concepts are important for progress - as individuals, and as society.

Today's topic is: Local AI Models

Perhaps it's the missiles and drones that were raining down on my newly adopted home over the last couple of months, but I've spent a lot of my time lately in the technology world having conversations about resilience, disaster recovery, business continuity, and sovereignty. A lot of people have seen what's happening in the middle east and decided that they should prepare - what if a rogue nation near them (or even one far away) decides to start attacking? What if something else happens? How can we plan for resilience? I've mentioned before having your WIFE - water, internet, food, and energy, specifically in that order so it spells "WIFE" and I can make those puns, but we can also take it a bit further than that, too.

🔨 BUILD: Your Local Models

The week before last (last week was about creation & just what you actually can get done these days) we discussed setting up an AI agent for your personal use, and exactly how to do that. (See the newsletter website (below) if you didn't read it.) Basically it comes down to building your isolation environment (Lume for Mac OS virtualisation), the Mac OS itself, the AI harness (OpenClaw, though I will investigate Hermes soon), and then the AI. The AI most of the time will run in the cloud. That's fine, but then you run across things like this online:

When you run your software someplace else, or in the control of someone else, you are of course subject to their actions, even if it may not be your fault. Many times you're not even able to get hold of a rational, thinking person to help you, but get shunted down an AI voice menu or a chatbot online to get help. (I'm aware of the irony there.) So as a general rule, in line with your WIFE, you probably want to consider how to have your own AI locally.

The downsides: a local AI obviously takes up resources, is actually local (and not cloud-based, but you can approximate that), and you have to maintain it yourself on reasonably decent hardware you have to buy. (Chances are you'll have to buy something, but not always.) The other thing is that they don't always have the same feature sets or speed that you're used to from the Claudes and ChatGPTs of the world.

However, there are of course upsides: it is local, so you have full control. While it may not be cloud-based, you can make it available via things like tailscale, so it can approximate cloud-based. (You could just put it directly on the internet, but I highly, HIGHLY discourage you from doing that.) Also, while local models may not fully reach the capabilities of cloud-based models, they are often more than adequate for what most people will do, especially if that's text-based. Local models can do things like coding, tool-calling, and can have highly structured memories as well. The other benefit with these is monetary - you don't have to pay $20 - $200 (or more) per month to an IntaaS company. (I'm hereby coining "Intelligence as a Service" if it's not already.)

So how do you do that? With the usual caveat that I use Mac and Linux and tend to stay away from Windows, it's actually shockingly simple.

First, you're going to use a program called "Ollama". Yes, everything in the AI world is some variation of an English word where the letters "LLM" appear. I'm not sure why (us?) autists like wordplay like that so much, but we do, so you'll just have to get used to it....

Ollama is available at ollama.com. For Windows you can download and install the executable in much the same way you do with any other piece of software. You can do that for Mac as well, but you can also run the install command in the Terminal or your Linux shell (there's no download for Linux): curl -fsSL https://ollama.com/install.sh | sh You can install it with brew on MacOS too: brew install ollama.

Then it downloads. Then it installs. Then you're done.

(I installed it on my laptop while I was writing this.)

(One thing to note: if you install the command line version you'll interact with it on the command line. If you install the package version, you'll get a GUI as well. It really doesn't matter which you use; I prefer the command line one personally.)

Okay, so we have Ollama, now how do we get the models? Well, in the GUI, you just choose one. In the shell, you issue the command:

ollama pull [modelname]

What model is the next question. There are a lot of them out there. The biggest names are: Gemma 4, Qwen, Minimax, K2, Llama 4, Deepseek, Mistral, and Phi-3. Then there are specialised ones that allow you to do things like speech-to-text (and vice-versa), image generation, video generation, etc. But we'll stick with regular models for now.

So once you decide which model you want, you can run it. In the GUI, just chat in the chat box (& you can switch models if you have more than one, which you can do easily!); on the command line just type: ollama run [modelname]

Then you can interact with it like you might with any other model or service.

One thing to note: Ollama is much more than just this - you can interact with cloud models, and they have their own services as well, which you can use similarly to ChatGPT/Claude. In this case, I'm giving you the easiest way to get something local up and running so you have your local capability as soon as possible.

One other thing to note: as of the most recent couple of versions, Ollama is designed to take advantage of Mac hardware. Apple Silicon (the fancy name for the chips inside Macs) has hardware to help machine learning ("MLX") and Ollama can now take advantage of it.

🛡️ DEFEND: Your Local Models

I mentioned specialty models, and the ability to have more than one. You can do all sorts of fun thing once you start mixing and matching agents and models ("Talos, run this in Claude Code." "Talos, use Gemma4 on Ollama on modelprime to write this.") with things like ACP (agent client protocol) but it can also get fairly involved fairly quickly. I tend to separate some of the things that I need to do into various modalities; I use Claude Code for coding rather than running it through Talos for instance because it's simpler to not add a middle "man" at this stage. However, once you start building AI companies you might want that, but that requires a bit more planning than we're doing just yet. On the defence side another thing to be aware of is where you model comes from before you install it. In many cases, the code for the model will be open source, so you can (and other people will have) checked the code. But check beforehand and be aware that model producers have their own agendas and capabilities and ways of doing things, and if they're not compatible with yours you'll probably want to pick a different model. From the ones I mentioned above: Gemma 4 (Google, US), Qwen (China), Minimax (China), K2 (China), Llama 4 (Meta, US), Mistral (France), Phi-3 (Microsoft, US). If you want to avoid US models, you probably don't want Llama 4. If you're avoiding Chinese models, K2 is not the one to grab. You can check out the Ollama website and the wonderfully-named Hugging Face website for more information. Personally I use Gemma4:31b on my Mac Studio and Gemma4:26b on my laptop.

That's another way to defend your "localness" as well when it comes to AI - you can install it on a laptop which you bring with you. I've now replicated everything on my Mac Studio on my laptop so I can have another stack wherever I am. StarQ is my OpenClaw running in MacOS in Lume, and Gemma4:26b runs as the local LLM.

(StarQ decided he was falcon-like since he was 'born' in Dubai in the UAE.)

💰 STACK: Your Specialties

When considering all this, first figure out what you want to do. Need to prepare your taxes? It's probably a bit of overkill to spend $5000 on a Mac Studio and install OpenClaw. Claude CoWork probably works better. Writing your doctoral dissertation and you need to do a lot of research, writing, and LaTeX layout? A local model on some decent hardware makes a lot of sense. Once you know what you want it for, align the hardware and software, and your outcomes will follow.

🔗 LINKS

Ollama

www.ollama.com

MLX on Ollama

www.ollama.com/blog/mlx

Hugging Face

www.huggingface.co

Open Claw

www.openclaw.ai

Claude / Claude Code / Claude Cowork / Claude Design

ChatGPT

Mastering the Command Line for AI: 50 Essential Commands for the Age of AI (my book)

https://amzn.to/4dAliAS

Previous newsletters: newsletter.builddefend.fyi

💬 ONE THING

In the interest of transparency, since I mentioned that would be the case: when it comes to passive income, it's gone up quite a bit because we're now renting out our London flat. However, I don't want to count that money because it's not based on things I created. That said, real estate rental returns are actual passive income, so I'm a bit torn….

Thanks for reading this newsletter! Feel free to respond any time.

Thomas

Was this forwarded to you? Subscribe at builddefend.fyi.

Had enough? [Unsubscribe] — no hard feelings.*