The 5 Models Worth Using in Clawdbot Right Now | Igor Gridel

As of today, April 4, 2026, Anthropic officially cut off Claude subscription access for third-party tools like Clawdbot. If you were using Claude Pro or Max through Clawdbot, that stopped working at noon Pacific today. You now need API keys with pay-as-you-go billing.

This has been coming since January when Anthropic started silently blocking OAuth tokens from third-party tools. They formalized it in their Terms of Service in February. OpenAI went the opposite direction and partnered with OpenCode to extend Codex subscription access to open-source tools.

So the model landscape for Clawdbot just shifted. Claude is still available, but the economics changed overnight. That makes this comparison more relevant than it would have been a week ago.

There's also something Clawdbot does that changes how you think about model comparisons in general. It separates the context window a provider advertises from the actual runtime tokens you get inside the assistant. So when MiniMax says 204K and Anthropic says 1M, those numbers are real, but what you experience during a long session is a different story.

I use Clawdbot as my main AI assistant. I've run real work through most of the models available on the platform. Not benchmark comparisons. Actual tasks, actual automation, actual multi-step workflows that get messy in the middle.

Here's where each one actually lands.

## MiniMax M2.7

This is the one that surprised me. At $0.30 per million input tokens and $1.20 per million output, it's absurdly cheap. And it's not cheap in a "you get what you pay for" way. MiniMax claims 97% skill adherence across 40 complex tasks and honestly that tracks with what I've seen. It stays on rails. It follows instructions. It does the boring, important stuff consistently.

The speed helps too. Around 60 tokens per second on standard, 100 on highspeed. When you're running the assistant through multi-step workflows where you're waiting on the model dozens of times per session, that adds up.

The weakness is real though. When you push it toward genuinely complex reasoning or when the context fills up near the ceiling, M2.7 can terminate early or just start making worse decisions. Their own docs admit this, which I appreciate. It's not a model for "figure out this entire problem from scratch." It's a model for "execute this workflow reliably and cheaply."

If I had to describe it in one line: high floor, moderate ceiling, and you'll spend almost nothing getting there.

There's also a subscription path if you don't want pay-as-you-go. $10 a month gets you 1,500 requests per 5 hours, $20 gets 4,500, and $50 gets 15,000. Depending on how you use it, that can be even cheaper than the token pricing.

## Moonshot Kimi K2.5

On paper, K2.5 is incredible. 256K context. Multimodal, so it handles text and images natively. Tool use, JSON mode, internet search built in, automatic context caching. A MoE architecture with 1 trillion total parameters. And the pricing is somewhere around $0.10 to $3.00 per million tokens depending on the tier, which is almost suspiciously cheap for what they're offering.

Moonshot even publishes specific setup guides for using K2.5 in Clawdbot. They clearly want this use case.

But there's a gap between the docs and the feel. In long sessions where the context is evolving, where earlier decisions affect later ones, where the assistant needs to track priorities through noise, K2.5 doesn't always keep up. It's not that it fails obviously. It just gradually loses the thread in a way that's harder to catch than a clear error.

The multimodal and search capabilities are genuinely useful though. If your workflow involves analyzing images or pulling live information from the web, K2.5 has native support for that where other models need extra tooling.

I'd use it for workflows where the structure is clear upfront and the assistant mostly needs to execute. I'd be more cautious handing it something open-ended that might run for a while.

## Zhipu GLM-5-Turbo

This is the one I think most people are missing, and for a specific reason. Zhipu isn't just offering another model you can plug into Clawdbot. They're explicitly training GLM-5-Turbo around Clawdbot's workflow patterns. Tool calling, instruction following, persistent tasks, long-chain execution. Their docs literally describe it as optimized for these exact scenarios.

That's a different thing than a general-purpose model that happens to work with Clawdbot. It means the model's training is shaped around the kind of work you're actually doing on the platform.

200K context, 128K max output, 55+ tokens per second. The pricing runs through their GLM Coding Plan subscription rather than pure pay-as-you-go, which makes it harder to compare directly to the others. But the strategic alignment is the real story here.

The honest question I can't fully answer yet: does the real-world stability match what they're marketing? Zhipu claims their flagship GLM-5 aims at Claude Opus 4.5 level. That's aggressive. GLM-5-Turbo sits below that but is specifically tuned for the assistant workflow use case rather than raw benchmark performance.

If you want to bet on a model that's being built specifically for how Clawdbot works rather than adapted from something designed for other purposes, this is the one to watch.

## OpenAI Codex (GPT-5.3-Codex)

OpenAI calls this "the most capable agentic coding model to date" and for coding-heavy work inside Clawdbot that's not an unreasonable claim. 400K context window, 128K max output, reasoning controls from low to extreme. The API pricing is $1.75 per million input and $14 per million output, which is expensive.

But here's the economics trick. Codex is included in ChatGPT plans. Free, Plus, Pro, Business, Enterprise. The usage limits vary, but if you're already paying for ChatGPT, your marginal cost for Codex access can be effectively zero up to your limit.

One thing worth knowing: OpenAI's own docs are slightly split. GPT-5.3-Codex is the specialized coding model, but the broader Codex product increasingly defaults to GPT-5.4 as its general-purpose workhorse. So depending on which surface you're accessing, you might be getting a different model than you think.

For pure coding and engineering work this is one of the safest picks. The trade-off is clear: if your Clawdbot use is mostly coding, Codex is strong. If it's broader, you're paying coding-model prices for a model that's specialized in one direction.

## Claude Sonnet 4.6 / Opus 4.6

I mentioned this at the top but it's worth repeating in context: as of today, you can no longer use your Claude Pro or Max subscription to power Clawdbot. Anthropic started blocking third-party OAuth tokens back in January and made it official in their Terms of Service in February. The final cutoff was today. If you want Claude inside Clawdbot now, you need to use the Anthropic API with pay-as-you-go billing. No subscription workaround.

So let's talk about what that actually costs.

Sonnet 4.6 runs $3 per million input tokens and $15 per million output tokens. Opus 4.6 is $5 input and $25 output. Both support the full 1M token context window at standard pricing, which is genuinely useful and didn't used to be the case.

There are ways to bring those numbers down. Batch API cuts everything in half, so Sonnet drops to $1.50/$7.50 and Opus to $2.50/$12.50 per million. Prompt caching helps too, cache reads are 90% cheaper than standard input. If your Clawdbot workflows repeat a lot of context (which they usually do), caching can make a real difference. Anthropic says you can save up to 95% when combining batch and caching. In practice you won't hit 95% but even 50-60% savings changes the math.

Capability-wise, this is still the top. Opus 4.6 handles the hardest long-horizon work with the most reliability. Sonnet 4.6 is the everyday workhorse that covers most real tasks without needing Opus.

Where Claude genuinely earns it: complex reasoning across long contexts, understanding nuanced instructions, maintaining coherence through extended multi-step workflows. The depth of understanding is still ahead of the cheaper options.

But the access situation makes it harder to recommend casually. Before today, you could run Claude through Clawdbot on a $20/month Pro subscription and not think about token costs. Now every session has a visible price tag. Sonnet at API pricing is still reasonable for focused work. Opus at $5/$25 per million tokens adds up fast if you're using it for everything.

The argument for Claude inside Clawdbot is now purely about capability. It's the best model for the hardest problems, but the economics shifted from "included in your subscription" to "pay per token, every time." For most daily Clawdbot work, the cheaper models have caught up enough that Claude becomes something you use selectively, not as a default.

## What I'd actually use

For daily work where you want solid execution without thinking about cost: MiniMax M2.7. The reliability at that price point is hard to beat.

For someone who wants to bet on a model built specifically for this platform: GLM-5-Turbo. Zhipu is making an interesting strategic bet that's worth watching.

For coding-heavy workflows: Codex, especially if you already have a ChatGPT plan.

For the hardest, most complex tasks: Claude Opus 4.6. Pay for what you need when you need it.

For maximum features per dollar: Kimi K2.5, but go in knowing the experience won't always match the spec sheet.

The real question isn't "which model is best." It's which model fits the kind of work you're doing most of the time. For me that's MiniMax for the daily stuff and Claude when things get genuinely complicated. Everything in between depends on what you're building.