I changed an implementation with GPT-5 this weekend and used it for the higher level design, and it was really good. Then I decided to let it try to implement it & the Elixir code was bizarre and Ruby-inflected and awful, so I went back to Claude for the implementation.
Perry E. Metzger
Perry E. Metzger11.8. klo 22.57
I’ve been seeing a bunch of people talking about how they have found GPT-5 to be a step down from previous models. I can’t speak to their experience, but my own has been the opposite. Yesterday I had GPT-5 Thinking design a complicated domain specific language for me for specifying the semantics of machine instructions in CPUs. (The purpose of the thing is to make it easier to create and maintain emulators for old computers, which is a hobby of mine.) The LLM wasn't perfect but did an excellent job and demonstrated some real creativity at the task. The resulting DSL is excellent and would have required a ton of work if I had done it by hand. I haven’t fully finished working with the LLM on the specification, but when I do, I suspect that it’s going to be able to one shot most of the compiler. Perhaps for some people the thing is working poorly; I don’t know what they are trying to do. I also can’t speak to their tastes on things like “personality”, and besides, my system prompt implicitly asks the model to be ultra professional and bland. (I also have no interest in the AI providing me with companionship or emotional support and do not want to have a parasocial relationship with it.) For me, it certainly is doing a really good job, and on an extremely complicated technical task. Note that I understand the work that I am asking the system to do extremely well, I am in a position to catch mistakes that it is making and correct them, and I consider a job that is 95% done after a few iterations to be a really good outcome because I can correct the rest. Perhaps if you’re trying to have the thing one shot a working circuit design and you know no electrical engineering, the thing is still not doing what you want. For me, though, it’s an amazing tool, and is a distinct improvement over o3.
But in terms of grasping what I was trying to do, and coming up with a great pattern for it, and a high-level plan, it was better than anything I've seen so far. It was only the lower-level coding that it whiffed -- usually with coding LLMs the opposite is true.
1,85K