these reasoning traces have been keeping me up at night on the left: new OpenAI model that got IMO gold on the right: DeepSeek R1 on a random math problem you need to realize that since last year academia has produced over a THOUSAND papers on reasoning (probably much more). we're practically all thinking about reasoning but all of our systems produce 'thinking traces' that look like DeepSeek on the right. they're incredibly, obnoxiously verbose, burning through tokens at a borderline negligent rate. a lot of the reasoning is unnecessary and some of it is completely incorrect but the reasoning on the left, this new thing, is something else entirely. clearly a step function change. potentially a different method entirely it's so much closer to *actual* reasoning. no tokens are wasted. if anything, it's exceptionally terse; i'd guess human solutions are more verbose than this clearly something very different is going on. maybe OpenAI developed a completely new RLVR training process. maybe there's some special data collection from experts. maybe they started penalizing the model for overthinking in a way that actually benefits it somehow really fascinating stuff... in general this makes me bearish on the R1-style reasoning
@marlboro_andres yeah, a few:
Alexander Wei
Alexander Wei19.7.2025
4/N Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.
152,28K