GPT-5 has just been released. Let's evaluate its performance in achieving complex AGI-like capabilities: - @grok 4 (Thinking) surpasses @OpenAI GPT-5 (High) in both the ARC-AGI-2 (complex reasoning) and ARC-AGI-1 (less demanding) tests. - Grok 4's superior accuracy comes with significantly higher costs per task, ranging from $2 to $4. - Lighter GPT-5 variants (mini/nano) provide a balanced trade-off between performance and cost on these benchmarks. Please note, the ARC-AGI-3 test is currently underway, and the results of the above tests do not imply model superiority. h/t @arcprize
1,2K