Argomenti di tendenza
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
gpt-oss-120b è così buono
tiene Gemini Pro 2.5 qui ed è il 98,9% più economico


26 mag 2025
Following our Sudoku-based reasoning benchmark announcement, we've been evaluating the latest models to track improvements in their reasoning capabilities.
Today, we’re launching the Sudoku-Bench Leaderboard: 
New technical report: 
You can now track new model progress on our live Leaderboard. Of the models we’ve benchmarked so far: OpenAI’s o3 Mini High leads overall. Interestingly, Gemini 2.5 Pro does better on the harder 6x6 puzzles! However, o3 is the only model that solves any of the 9x9 Sudokus, but only 2.9% and only the vanilla Sudoku’s.
Crucially, NO model tested can yet conquer 9x9s requiring strong, creative reasoning. This benchmark remains a grand challenge! For a deeper dive into the benchmark, methodology, and our findings, check out our technical report.
Want to test a model on Sudoku-Bench? It's simple! Visit the leaderboard. Choose a puzzle. We generate a prompt (puzzle + instructions) to paste into any model. Explore sample reasoning traces from our tests too!

> o3 è l'unico modello che risolve qualsiasi Sudoku 9x9
gpt-oss-120b è anche in grado di risolvere i 9×9 (1,4%). L'unico altro modello su peval che ha risolto dei 9×9 è GPT 5
1,83K
Principali
Ranking
Preferiti

