熱門話題
#
Bonk 生態迷因幣展現強韌勢頭
#
有消息稱 Pump.fun 計劃 40 億估值發幣,引發市場猜測
#
Solana 新代幣發射平臺 Boop.Fun 風頭正勁
為什麼我們不能把時間戳放入消息中,讓模型知道「好吧,我大概還有15秒,可能還有幾個步驟。」

2025年8月10日
I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case.
For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries.
This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc.
Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".
我們可以大概用 Claude 代碼鉤子來做到這一點
6.98K
熱門
排行
收藏