GPT-5 isn't ready for production agentic work. Kimi might be. More receipts (as with any good accounting) Ran GPT-5 with Opus 4.1, but it took so long I ended up running 3 models while waiting for GPT-5 to finish. Runs here are a bit hyper-verbal, so my quick annotation: 1. Instruction-following Asked to "use the TypeScript workspace provided" among other things -GPT-5: Ignored for 15 mins, wrote 31 shell commands first -Kimi: Tried TypeScript immediately (failed 3x on paths but kept trying) -Opus: TypeScript at minute 2 -Sonnet: TypeScript at minute 7 2. Error-handling -GPT-5: 500-char command fails → expands to 2000+ chars → still fails → keeps expanding -Kimi: Path error 3x → finally simplifies → works -Opus: 95% work first time -Sonnet: Tool missing → switches approach → continues 3. Unique-findings (our core work - worth its own post) -GPT-5: Schema changes (RIDRETH2→RIDRETH3), naming patterns (_J suffix) -Kimi: Basic validation - SEQN exists, 9966 participants -Sonnet: Mental health hidden in Other/, 1.4M row files -Opus: 86% designed sparsity, 2-323 column range 4. Code-produced -GPT-5: inventory.ts with 2000+ char bash embedded inside -Kimi: simple_validate.ts - minimal but works -Sonnet: comprehensive_analysis.ts - clean separation -Opus: 3 modular files - extensible framework 5. Resources -Kimi: 14 min, $1.59 -Sonnet: 6 min, $1.87 -GPT-5: 27 min, $5.04 -Opus: 10 min, $10.46 That said, I can see GPT-5 knows a lot of technical tricks and a pretty capable actor at baseline - but comes with high error margin and risk of deviating from the point (which it did multiple times with this task). I might use it for quick debugging, but a massive codebase or analysis task, I'd prefer kimi with many guardrails as we stand.
61,78K