I thought about posting this paper but rebranding it as the Claude Mythos technical report. As far as I can tell, there’s no secret tricks the US frontier labs have, and that basically describes how Mythos was trained. What’s in that paper just works, and for verifiable domains, it’s only a matter of fixing bugs and scaling up. That’s why Anthropic is so desperate for regulatory capture, AI has no moat.

AI (and any form of search) has this property where you spend exponentially more money to get linear returns. So for a bit we’ll live in an era where AI can in theory solve very hard problems, but it’s very expensive to do so.

The Internet has been fully mined, and it yielded 20T good tokens. For a Chinchilla optimal model, that’s only 1T weights (1e26 training run if dense). 500 GB gets you all of human knowledge in a simple to query archive. For comparison, Wikipedia is 24 GB with mediocre compression.

Technology proceeds in terms of S-curves, and AI has gone through a few of them already. I know I’m quite late to this, but I’m feeling optimistic that scaling will mostly stop yielding results. GPT 5.5 is to a point where it’s really hard for me to stump it with any problem. What does “superhuman intelligence” even mean at that point if humans can’t detect it if it’s superhuman?

There will be some domains where it’s still detectable. Any form of optimization where the humans can marvel at how low it got the number qualifies. And there will be creepy Medusa systems that directly optimize for engagement, be careful not to look at them directly. But what does it mean for a song to be superhuman? Contrary to the beliefs of the rationality cult, most things aren’t optimization problems. The whole hard problem is determining what to optimize for.

The era of scaling yields clearly better AI is over, now we enter an era of efficiency and taste. Let’s get the tools to hit the end of this S-curve distributed to as many people as possible. Taste is an arena where tons of people can play.