How to Set Up a Free Coding Agent on Your Machine in 10 Minutes | Igor Gridel
I pay for Claude Code. I use it every day, I built skills for it, and I think it's worth the money. I'm saying this upfront because this post is going to show you how to get a coding agent for free, and I don't want you wondering whether I actually believe in the paid version. I do. But not everybody needs it, not everybody can afford it, and not everybody wants their codebase sent to a server they don't control.
If that's you, this is the setup.
Three pieces of software, all free, all open source. You install them, connect them, and ten minutes later you have a coding agent running in your terminal that can read your files, write code, run commands, and help you build things. Your files stay on your machine. No API key. No credit card. No trial that expires in fourteen days.
## What you're actually installing
**Ollama** runs AI models locally. Think of it as a model server sitting on your laptop. You pull a model the same way you'd pull a Docker image, and it handles all the inference. Free, open source, one command to install.
**Gemma 4** is the model. Google DeepMind released it on April 2, 2026 under Apache 2.0, which means you can use it for anything, commercially or personally, no restrictions. The 26B parameter variant uses a Mixture of Experts architecture that only activates 3.8 billion parameters per inference. That means a 26 billion parameter model runs with the memory footprint of a much smaller one. It scores 77.1% on LiveCodeBench v6 (competitive coding) and 82.3% on GPQA Diamond (graduate-level science questions). For a free local model, those numbers are absurd.
**OpenCode** is the agent. Open source, 140,000+ stars on GitHub, built by the anomaly.co team. It's a terminal-based coding agent that connects to whatever AI backend you point it at. Claude, GPT, Gemini, or in our case, a local Ollama server running Gemma 4. It reads your project files, suggests edits, runs commands. The full agent experience, just powered by a model running on your own hardware.
## Step 1: Install Ollama
Go to [ollama.com/download](https://ollama.com/download) and grab the installer for your OS. Mac, Windows, Linux, all supported.
On Mac or Linux, you can also run:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
Once installed, Ollama runs as a background service. You can verify it's working with:
```bash
ollama --version
```
## Step 2: Pull Gemma 4
This is where you choose your model size. Two realistic options:
**If you have 24GB+ RAM** (most desktops, some high-end laptops):
```bash
ollama pull gemma4:26b
```
This is the 26B MoE variant. The best balance of capability and hardware requirements. It activates only 3.8B parameters per inference, so it runs faster than you'd expect from a 26 billion parameter model.
**If you have 8GB RAM or less** (older laptops, budget machines):
```bash
ollama pull gemma4:e4b
```
This is the E4B variant, 4 billion parameters. It won't be as capable, but it'll run smoothly on almost anything and still handle basic coding tasks, file operations, and simple refactors.
The download will take a few minutes depending on your connection. The 26B model is around 18GB, the E4B is around 10GB.
## Step 3: Fix the context window (do not skip this)
This is the gotcha that wastes people's time. Ollama defaults every model to a 4,096 token context window. Gemma 4 supports 128K tokens on the E2B and E4B variants, 256K on the 26B and 31B, but Ollama doesn't care. It gives you 4K unless you explicitly tell it otherwise.
4K tokens is roughly one medium-sized file. For a coding agent that needs to read your project structure, understand multiple files, and maintain conversation context, 4K is useless. You'll get responses that cut off mid-thought, forget what you asked three messages ago, or just fail silently because the model ran out of room.
Create a file called `Modelfile` (no extension) in any directory:
```
FROM gemma4:26b
PARAMETER num_ctx 32768
```
If you pulled the E4B instead, use `FROM gemma4:e4b`.
Then create the custom model:
```bash
ollama create gemma4-agent -f Modelfile
```
Now you have a model called `gemma4-agent` with a 32K context window. On machines with 24GB+ RAM you can push this to 65536 or even 131072, but 32K is the sweet spot where you get enough context for real agent work without crushing your memory.
Test it works:
```bash
ollama run gemma4-agent "What is your context window size?"
```
## Step 4: Install OpenCode
Check [opencode.ai](https://opencode.ai) for the latest install method. As of April 2026:
**Mac/Linux (recommended):**
```bash
curl -fsSL https://opencode.ai/install | bash
```
**npm (any platform):**
```bash
npm i -g opencode-ai@latest
```
**Windows (via Scoop):**
```bash
scoop install opencode
```
**Mac (via Homebrew):**
```bash
brew install anomalyco/tap/opencode
```
Verify the install:
```bash
opencode --version
```
## Step 5: Point OpenCode at your local model
OpenCode needs to know where your model lives. Create or edit the config file at `~/.config/opencode/opencode.json`:
```json
{
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"gemma4-agent": {
"name": "Gemma 4 Agent (local)"
}
}
}
}
}
```
The baseURL points to Ollama's local API. OpenCode talks to it using the OpenAI-compatible protocol, which Ollama supports out of the box.
## Step 6: Use it
Navigate to any project directory and launch OpenCode:
```bash
cd your-project
opencode
```
On first run, select the Ollama provider and the gemma4-agent model. Then just talk to it.
Try something simple first:
> "Read the files in this directory and tell me what this project does."
Then try something practical:
> "Find all PNG images in this project and list their file sizes."
> "Write a bash script that converts all PNG files to WebP format."
> "Look at my package.json and tell me which dependencies are outdated."
If you're getting coherent, useful responses, it's working. The model is running entirely on your hardware, your files never leave your machine, and you paid nothing.
## What you'll notice (honest take)
I'm not going to pretend this is equivalent to Claude Code or Cursor with Claude 4 behind it. It isn't. A local model with 3.8 billion active parameters is not going to match a frontier model with orders of magnitude more compute. You will notice the difference on complex multi-file refactors, on subtle architectural decisions, on tasks that require holding a lot of context at once.
But for a huge amount of daily coding work, it's genuinely good. File operations, simple scripts, refactoring single files, generating boilerplate, explaining code, converting formats. The stuff that takes you five minutes of tedious typing but doesn't require deep reasoning. Gemma 4 handles that well.
And for anyone who cares about privacy, there is no alternative that matches this. Your code, your files, your conversations, all of it stays on your machine. No server. No logs. No terms of service that might change next quarter.
## The skills angle
I built a set of utility skills that work with OpenCode, Claude Code, Codex, all of them. They're on my Patreon. But honestly, with the setup above and the commands I share in my other posts, you can do most of it for free. The skills save you time. The setup in this post saves you money. Pick whichever matters more to you right now.