I Built a Utility Brain for Coding Agents | Igor Gridel
Last Tuesday I asked Claude Code to compress a batch of PDFs for email. It spent two minutes researching ghostscript flags, tried three wrong combinations, then produced files that were actually larger than the originals.
I had solved this exact problem two weeks earlier. Same agent, same tool, same flags. But Claude Code does not remember what worked. Every session starts from zero. It researches the same ghostscript documentation, stumbles through the same wrong defaults, and arrives at the same answer it already found before, if I am lucky. Sometimes it finds a worse one.
This is not a Claude Code problem. It is a structural problem with how coding agents work right now. They have enormous intelligence and zero muscle memory.
## The gap nobody talks about
Coding agents are genuinely good at hard problems. Architecture decisions, complex refactors, debugging race conditions, writing parsers. The stuff that requires reasoning. But ask one to convert a PNG to WebP at 85% quality without overwriting the original, and it will spend thirty seconds reading the ImageMagick docs as if it has never seen them before. It has seen them. It just cannot remember that it has.
The same thing happens with ffmpeg flags for video compression, qpdf for PDF manipulation, pngquant for image optimization. Every session, the agent treats these as novel problems that require research. They are not novel. They are the same ten operations repeated hundreds of times, and the correct approach for each one was established years ago by people who actually use these tools daily.
I kept watching this happen. Not on hard problems. On the trivial ones. The agent would nail a complex database migration and then choke on "make this JPEG smaller." It was like working with a brilliant colleague who could not remember where the coffee machine was.
## What I actually built
I spent a week building something I started calling a utility brain. Six skill files, written in plain markdown, that give a coding agent reliable defaults for the operations it keeps researching from scratch.
The six skills: PDF, Image, Video, Audio, File Ops, and Automation.
Each one is a single SKILL.md file. No scripts, no binaries, no dependencies to install. Just markdown with YAML frontmatter that the agent reads and follows. The total pack is about 54KB. For context, that is smaller than most README files.
Here is what surprised me while building it. The skills are not lists of commands. They have a four-layer architecture that I did not plan but emerged from the problems I kept hitting:
**Router.** The agent reads what you asked for and figures out which operation you need. "Make this smaller for email" routes to lossy compression with specific quality targets. "Archive these" routes to something different.
**Domain logic.** This is where the actual knowledge lives. Quality presets, format-specific rules, the things that a developer who works with ghostscript every day just knows. The defaults are not generic. They are the specific values that produce good results for specific use cases. Ebook PDF compression uses different settings than print-ready compression, and both are different from "I need to email this."
**Execution.** The agent detects which tools are installed on your machine, picks the best available one, and falls back to alternatives if the first choice is missing. If ghostscript is not installed, it tells you how to install it. It does not silently fail or try a worse approach.
**Output contract.** Safety rules. Originals are never overwritten. Lossy operations are announced before they run. Destructive operations do a dry-run first and show you what will happen. I added these after the agent helpfully deleted my source files during an early test. Twice.
## The part that changes how you think about agents
The Automation skill has something I have not seen anyone else build. Eight named presets for common batch workflows:
prepare-for-email, blog-asset-pack, social-video-bundle, photo-delivery-pack, screenshot-doc-pack, asset-cleanup, pdf-archive-pack, publish-ready-images.
You say "prepare these for email" and the agent knows that means compress PDFs under 10MB, convert images to WebP, strip metadata, and organize the output. You do not specify any of that. The preset knows.
This is the reframe that took me a while to see. We talk about coding agents like they are junior developers who need better prompts. They are not. They are closer to a brilliant contractor who shows up to a new job site every morning with no memory of yesterday. The problem is not intelligence. The problem is that they have no defaults for common work. No muscle memory.
Skills are muscle memory. The agent does not need to be smart about ghostscript flags. It needs to remember that `-dPDFSETTINGS=/ebook` is the right call for email attachments and that the quality tradeoff is acceptable. That is not reasoning. That is experience, compressed into 315 lines of markdown.
## Why markdown
I tried writing these as shell scripts first. It was a mistake for three reasons.
Shell scripts are brittle. They assume a specific OS, specific tool versions, specific directory structure. The moment someone runs them on Windows instead of Mac, half the commands break.
Shell scripts are opaque. The agent executes them but does not understand what they do. If something fails, it cannot adapt because the logic is hidden inside a bash file it is just running.
Markdown skills are transparent. The agent reads the logic, understands the intent, and can adapt when something unexpected happens. If ImageMagick is not installed but ffmpeg is, the Image skill's fallback chain handles it. A shell script would just fail.
The line counts tell the story. PDF: 315 lines. Image: 273. Video: 318. Audio: 374. File Ops: 374. Automation: 299. That is the entire utility brain, under 2,000 lines total. All of it human-readable, all of it editable with a text editor, no build step, no compilation, nothing to run.
And because they are plain markdown, they work everywhere. Claude Code, Codex, OpenCode, Gemini CLI, Cursor. Any agent that can read a file can use them.
## What I actually use it for
I run a Patreon where I sell ComfyUI workflows for visual content creation. That means I handle a lot of image conversion, PDF packaging, video compression, and file organization. Every week.
Before the utility brain, each of those operations involved the agent researching the same tools it had researched last week. After, I say "prepare this batch for Patreon delivery" and the Automation skill's photo-delivery-pack preset handles the entire pipeline. Same quality every time. No research loop. No wrong flags.
The time I saved is not dramatic, maybe fifteen minutes a day. But the consistency changed everything. I stopped checking the agent's work on routine operations because the output contract guarantees originals are preserved and lossy decisions are announced. I can trust the boring stuff and focus on the parts where the agent's intelligence actually matters.
That is what utility skills do. They move the floor up. The agent still thinks about hard problems. It just stops pretending that "convert PNG to WebP" is a hard problem.
## The thing I would tell someone building their own
Start with the operations you repeat. Not the clever ones. The boring ones you keep explaining to the agent like it is hearing them for the first time, because it is.
Write them as markdown, not as scripts. Let the agent read your intent, not just your commands. Include fallback chains for tools because not every machine has the same setup. And add safety rules early, before the agent teaches you why you need them by deleting something important.
The four-layer pattern (router, domain logic, execution, output contract) was not something I designed upfront. It is what emerged when I kept fixing the same categories of failure. The agent would misroute a request. Fixed that with clearer intent detection. It would use wrong defaults. Fixed that with domain-specific presets. It would fail silently when a tool was missing. Fixed that with fallback chains. It would overwrite originals. Fixed that with output contracts.
Each layer exists because of a specific thing that went wrong.
## Where to get it
The utility skills pack is on my Patreon, alongside the ComfyUI workflows and other tools I build. Or build your own. The architecture is all here, and the tools it wraps (ghostscript, ffmpeg, ImageMagick, qpdf, cwebp, pngquant) are all free and open source. The value is not in the tools. It is in knowing which flags to use and when, which is exactly the thing agents keep forgetting.