Where Do Agents Actually Live? Part 2: The Hardware Question

Part 1 of this series looked at the logical architecture, the execution, reasoning, and agency layers of an agentic system. This part 2 gets more concrete and more contentious: where should the thing physically run?

An old friend asked me this earlier this week and suggested I listen to a podcast - the most recent Diamandis Moonshot episode where an excited discussion ensues about the merits of running AI agents locally, with guest Alex Finn, a Youtuber popular in the Openclaw space. During the podcast, the group enthusiastically recommends a Mac Mini as the obvious home base for your personal AI agent. Local hardware, local models, full control, no token costs. The energy was infectious. And my friend wanted to know if I agreed.

My instinct was fairly immediate: not really. VPS plus cloud inference endpoints, don’t overthink it.

But then I sat with that answer for a minute. There are genuinely smart people making the local case - not just hype merchants, but developers with real experience who’ve run the numbers and landed somewhere different than me. And I have enough of a systems background that I’m aware my instincts could maybe bias me toward cloud infrastructure even if it’s not the right answer. So I decided to actually work through it rather than just assert a position.

What follows is me thinking out loud more than delivering a verdict. I’ll tell you where I land, but the reasoning matters more than the conclusion.

What the local argument actually is

First, it’s worth spending a moment discussing what people mean when they say “run it locally,” because there are two separable things that keep getting conflated:

Running the OpenClaw gateway locally, the Node.js process that manages sessions, routes messages, and executes tools. This is what gives you access to your local filesystem, your home network, your IoT devices, your iMessage on a Mac. The gateway being local has nothing to do with where the LLM is.

Running the LLM locally, the actual inference, the many gigabytes of model weights, the compute (GPU usually). This is what eliminates token costs. This requires real hardware. And this is entirely independent of where the gateway runs.

You can run the gateway locally against cloud inference endpoints, Anthropic, Google, OpenAI, whatever, and get all the local integration benefits while still paying per token. You can also run the gateway on a VPS and point it at a cloud LLM, which is the standard cloud setup. And then there’s the full local stack: gateway on your hardware, model and inference on your hardware, nothing leaves your network.

The Mac Mini discourse tends to bundle all of this together. The actual decision tree has more branches and nuance.

The token cost problem is real

Let me steelman the local model argument before I pick at it, because the cost concern is legitimate.

Running an agent like OpenClaw 24/7 against a model like Claude Sonnet isn’t cheap. The agent has a heartbeat, it’s not just responding to your messages, it’s running background checks, monitoring things, executing scheduled tasks. That’s a sustained stream of API calls, not occasional bursts. Even Gemini Flash, which is about as cheap as a capable frontier model gets right now, starts to add up at continuous operation. We’re talking a few dollars a day at realistic usage, potentially more if your automation is busy. That’s real money, $50 to $100++ a month isn’t out of the question for a heavily-used personal agent.

A local model eliminates that entirely. Once the hardware is paid for, inference is essentially free, besides electricity, which is negligible. That’s a genuine value proposition, and dismissing it entirely would be intellectually dishonest.

The question is whether the economics actually work out when you examine them carefully.

The hardware reality check

Here’s where the local argument starts to get complicated.

Running a local LLM that’s actually useful, capable enough to handle the reasoning, tool-calling, and structured output that an agent workload requires, takes real hardware. The community often talks about this casually, but let me be specific:

A 7B or 8B quantized model can technically run on 8GB of VRAM, but the context window you get at that configuration is limited, and the model quality shows. OpenClaw itself recommends at least 64K tokens of context for agent use. Hitting that number with reasonable performance and a capable enough model really wants 16GB of dedicated VRAM, and 24–32GB is where the experience becomes genuinely good. Below that threshold you’re making meaningful tradeoffs on either model quality or context length, and possibly both.

The Mac Mini gets mentioned constantly because Apple’s unified memory architecture means you can use all of it for inference - a 24GB M4 Mac Mini can run models that would need a 24GB GPU card in a PC. But unified memory shares bandwidth between CPU and GPU tasks, and model inference on Apple silicon is meaningfully slower than on dedicated VRAM at equivalent memory capacity. It’s not the performance story it gets marketed as. It’s better than CPU-only, but it’s not a substitute for a proper discrete GPU.

Now the Canadian price reality, since that’s my context: an M4 Mac Mini with 16GB of RAM and a 512GB drive, which is genuinely the minimum configuration you’d want if you’re storing model weights alongside everything else, runs about $1,099 CAD plus tax. Call it $1,250 out the door. Bump to 24GB of memory, which is where you’d want to be for running anything meaningful, and you’re just under $1,400 before tax. With tax, probably $1,600 or so depending on province.

That’s roughly three years of $50 per month. And $50 per month covers a solid VPS plus moderate API usage, possibly a lot of API usage if you’re smart about model selection and routing.

Three years is a long time in this space. Within that window, model capabilities will continue improving, API prices will likely continue falling (they’ve dropped dramatically over the past two years), and new hardware options will emerge. The Mac Mini you buy today is not going to be more capable than the inference options you’ll have access to via API in 2028. The hardware depreciates in capability terms even as it maintains physical function.

What the local case is actually good for

Despite all of that, there are real scenarios where local makes sense. I want to be fair to them.

IoT and local network integration. If you want your agent to interact with smart home devices, local services, or anything that lives on your home network and isn’t exposed to the internet, the gateway needs to be physically present on that network. A VPS can’t reach your Philips Hue lights or your local Plex server. This is a genuine capability gap for cloud deployments, but notice that this only requires the gateway to be local. You can run the gateway on a Raspberry Pi 5 or an old laptop against cloud inference endpoints and get everything you need here for very little hardware investment.

Serious privacy requirements. If you’re running an agent that will touch genuinely sensitive data, not just “I prefer privacy” but actual professional, legal, or compliance-level sensitivity, then keeping inference local means prompts never leave your network. That’s a real and meaningful distinction for some people in some contexts.

Already-owned hardware. If you have a machine sitting around with a decent GPU, a gaming PC with a 4090 and 24GB of VRAM, perhaps. Then the economics look completely different. The hardware cost is sunk. Running Qwen 3.5 or similar locally on that machine is essentially free inference on hardware you already own, and the opportunity cost argument disappears.

The developer tinkering case. If you’re building skills, testing agent behavior, or treating this as a development environment where you want to iterate fast without API bills accumulating, local makes sense as an experimentation setup.

What these cases have in common is that they’re specific, bounded, and don’t require a Mac Mini purchase to justify. The “everyone should consider a Mac Mini for their AI agent” version of this argument is much weaker than the “here’s when local actually makes sense” version.

Separating the gateway from the model

This is probably the most underappreciated point in the whole debate, so I want to linger on it.

A significant portion of what people want from local deployment, the home network access, the IoT integration, the local filesystem capabilities, the iMessage support on macOS, doesn’t require local model inference. It requires a local gateway process. That process is lightweight. It runs fine on a Raspberry Pi 5, an old NUC, a decommissioned laptop. Hardware that most technically-inclined people either own already or can easliy acquire for around $200.

The gateway being local gives you:

Access to your home network and local devices
Local filesystem read/write
Integrations with local-only capabilities like iMessage (requires macOS, but a current gen Mac Mini isn’t the only Mac)
Always-on presence without paying VPS fees

And it can do all of that while pointing at Gemini Flash or Claude Haiku for inference, models that cost fractions of a cent per thousand tokens and are significantly more capable than anything you can run locally on consumer hardware.

This configuration with a local gateway and cloud inference, doesn’t get talked about much because it doesn’t fit neatly into either the “go full local” or “just use a VPS” camps. But for a lot of people, it’s probably a very pragmatic path.

Scale and the organizational angle

One more lens worth applying: what does this look like when it’s not just a personal setup?

I come from an enterprise architecture background, and the “local is better” argument essentially doesn’t really survive contact with large organizational scale. Not many IT departments are going to procure dedicated GPU hardware for each agent workload. The TCO story falls apart, the security and compliance story gets harder (not easier) with distributed local inference, and the operational model of maintaining local model versions across an estate is genuinely painful.

At scale, the answer is obvious: cloud inference, managed infrastructure, VPS or container-based gateway deployments. The economics, the operational model, and the capability profile all point the same direction.

This doesn’t directly answer the personal use case question, but I find it useful context. When the organizational answer is unambiguous, it at least raises the question of what specifically about the personal case changes the calculus, and the honest answer is: token cost and a handful of specific local integration needs. That’s a narrower justification than the general enthusiasm suggests.

The security argument for local, and why it’s not as airtight as it sounds

There’s a version of the local argument that goes: my machine is behind a home router with no open inbound ports, so it’s inherently safer than a public-facing VPS. And that’s not wrong, a fresh VPS with OpenClaw installed is publicly addressable by default, and it needs real hardening before I’d be comfortable leaving it alone. Proper firewall rules, SSH key-only access, no root login, the admin API not exposed to the world. Even my own DigitalOcean VPCs need additional work before they let me sleep at night. Most quick-start guides don’t mention any of this.

But the “local is safer” assumption has its own holes. CVE-2026-25253, a critical RCE vulnerability disclosed in January 2026, specifically targeted instances running on localhost. The attack used the victim’s browser as a pivot point, exploiting the fact that browsers don’t block cross-origin WebSocket connections to localhost. Visiting a single malicious page was enough to compromise a local OpenClaw instance that had no internet exposure whatsoever. The gateway being local didn’t help.

Ironically, a cloud deployment tunneled through Cloudflare or a VPN, which never binds to localhost at all, sidesteps that specific attack vector more cleanly than a local setup does.

Neither configuration is automatically safe. A VPS needs hardening. A local instance needs its own set of precautions. And running it on your daily driver machine, which is what most getting-started guides implicitly encourage, is the worst of all worlds regardless of where it’s hosted, you’re putting a shell-access agent on the same machine as your passwords, your files, and everything else that matters. The security argument for local is maybe real but much narrower than the hype suggests.

Where I actually land

After working through this more carefully than my initial instinct demanded, I think my original answer was roughly right but for incomplete reasons.

For most people asking “should I get a Mac Mini to run my AI agent,” the answer is probably no, not because local is a bad idea in principle, but because the specific purchase is hard to justify on the numbers. You’re buying expensive hardware to run a less capable model than you’d get from a cheap API, with a payback period measured in years, in a landscape that’s changing fast enough to make three-year hardware bets uncomfortable.

The local model argument is really a token cost argument. And the token cost argument is most compelling if you already have the hardware or have a specific, quantifiable reason why API costs are a problem for your use case.

If you want local integrations, IoT, home network, filesystem access, a cheap local machine running the gateway against cloud endpoints is probably enough. You don’t need local inference for that.

If you want zero token costs and you have the hardware, the economics work. If you’d have to buy the hardware, run the numbers honestly before you do.

And if you genuinely need always-on automation, monitoring, scheduled jobs, something you’re depending on, that’s actually an argument for cloud, not local. A home machine goes down when your power flickers, when your ISP has a bad morning, when someone closes the lid. A VPS just keeps running. The cases that remain genuinely compelling for local are the narrower ones: sensitivity requirements that truly preclude cloud inference, local network integrations that can’t be served by a lightweight gateway, or hardware you already own. Just be specific about which of those things is actually true for you, rather than treating “it seems cool and smart people on podcasts like it” as sufficient justification.

Smart people are enthusiastic about local deployment for real reasons. But I don’t think those reasons necessarily apply to most of the people currently looking for their own agent setup.

I could be wrong. I’m genuinely uncertain about where hardware economics land in two years. But that uncertainty cuts both ways, it’s also an argument against locking into hardware today.

What the local argument actually is#

The token cost problem is real#

The hardware reality check#

What the local case is actually good for#

Separating the gateway from the model#

Scale and the organizational angle#

The security argument for local, and why it’s not as airtight as it sounds#

Where I actually land#