Claude 4.7 Dumber, Opens Smarter

Source: andrewzuo.com

TL;DR

The story at a glance

Andrew Zuo's opinion piece critiques Anthropic's Claude Opus 4.7 release for higher token consumption and apparent quality drops versus Opus 4.6, while praising open-source models' gains on arenas like LLM Arena. He questions why users stick with Claude amid these issues and rising open alternatives. The post appeared hours after Opus 4.7's April 2026 launch amid user reports of regressions on benchmarks like Thematic Generalization (80.6% to 72.8%) and NYT Connections.[[4]](https://www.reddit.com/r/singularity/comments/1snlp29/claude_opus_47_high_unexpectedly_performs)

Key points

Details and context

Zuo spotted issues via posts on Claude Code Camp (measuring 1.325x weighted tokens on code samples) and Bill Chambers' leaderboard (+36.9% tokens for 4.7).[[5]](https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you)[[2]](https://tokens.billchambers.me/leaderboard) These raise costs for heavy users, as billing is token-based, and fill context windows faster—key for long agent tasks Anthropic targets.

Regressions appear tied to safety tweaks, adaptive reasoning defaults to low-effort (saving compute), and training focus on coding over general tasks like long-context retrieval or generalization.[[7]](https://www.reddit.com/r/singularity/comments/1sn52vp/claude_opus_47_benchmarks)[[4]](https://www.reddit.com/r/singularity/comments/1snlp29/claude_opus_47_high_unexpectedly_performs) Official benchmarks highlight coding wins, but third-party evals reveal drops, echoing past "nerfs" in Claude updates.

Open models gain via rapid releases (e.g., Qwen 3.6, DeepSeek V3.x) and arenas where cost-free local runs appeal; no direct causation stated, but timing follows Opus 4.7 complaints.[[3]](https://medium.com/@impure/claude-is-getting-dumber-and-open-models-are-getting-smarter-so-why-havent-you-switched-yet-f0bef0eddb80)

Key quotes

Why it matters

Claude's stumbles highlight risks in proprietary AI reliance, where updates can degrade usability despite hype. For developers and businesses, this means weighing higher costs and inconsistent performance against free, improving open models for many tasks. Watch independent benchmarks like LMSYS Arena or Artificial Analysis for Opus 4.7 stabilization and next open releases like potential Llama 4, though real-world fit varies by use case.[[8]](https://artificialanalysis.ai/)

What changed

Opus 4.6 used fewer tokens and scored higher on tests like Thematic Generalization (80.6%), NYT Connections (94.7%), MRCR 1M (78.3%). Opus 4.7 now uses ~37% more tokens, scores lower on those (72.8%, 41%, 32.2%), launched April 16, 2026.[[9]](https://medium.com/your-latest-ai-learnings/claude-opus-4-7-is-a-huge-pile-of-trash-6e389d152a83)[[2]](https://tokens.billchambers.me/leaderboard)

FAQ

Q: Why does Claude Opus 4.7 use more tokens than 4.6?

A: Its new tokenizer processes English/code content at 1.3-1.47x rate per samples and leaderboard data from 688 runs showing +37% average, raising costs without price change. This hits context limits and speed faster. Author calls it a stealth price hike.[[5]](https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you)[[2]](https://tokens.billchambers.me/leaderboard)

Q: What benchmarks show Opus 4.7 worse than 4.6?

A: Thematic Generalization dropped to 72.8% from 80.6%, NYT Connections Extended to 41% from 94.7% (with refusals), MRCR v2 1M to 32.2% from 78.3%; custom SaaS tasks also lost per Openmark.ai tests. Coding benchmarks improved officially.[[4]](https://www.reddit.com/r/singularity/comments/1snlp29/claude_opus_47_high_unexpectedly_performs)

Q: How are open models getting smarter per the article?

A: One open model ranks 7th on LLM Arena leaderboard, outperforming DeepSeek 3.2 and Mistral, amid Claude issues signaling shift where opens close gap cost-effectively. No specific benchmarks beyond arena cited.[[3]](https://medium.com/@impure/claude-is-getting-dumber-and-open-models-are-getting-smarter-so-why-havent-you-switched-yet-f0bef0eddb80)

Q: Why hasn't the author switched from Claude yet?

A: Article poses this rhetorically without personal answer in visible text; implies inertia despite costs/regressions vs open gains, questioning reader habits.[[1]](https://andrewzuo.com/claude-is-getting-dumber-and-open-models-are-getting-smarter-so-why-havent-you-switched-yet-f0bef0eddb80)