GPT-OSS 在需要原始工具調用的基準測試中表現不佳。例如,CORE-Bench 要求代理運行 bash 命令以重現科學論文。 DeepSeek V3 得分 18%。 GPT-OSS 得分 11%。
Nathan Lambert
Nathan Lambert8月12日 23:44
gpt-oss is a tool processing / reasoning engine only. Kind of a hard open model to use. Traction imo will be limited. Best way to get traction is to release models that are flexible, easy to use w/o tools, and reliable. Then, bespoke interesting models like tool use later
6.1K