GPT-5 results! + Longform writing update: I added new instructions to help the judge notice & punish overuse of incoherent metaphors, & re-ran the leaderboard. It was becoming a problem with many frontier models converging on this slop. Some rank changes; now Opus 4.1 is #1
9,85K