I think the reason the Chinese are so close to the frontier, and the reason so much compute is going towards inference, is mostly that none of these «frontier» guys have a confident idea for a $1B pretraining run. They don't know WHAT to scale that far
11,3K