The new GPT-5 performs worse than Opus 4.1 in Stagehand evals in both speed and accuracy. The smaller models are faster, but also still fall short of Opus 4.1.
13,55K