LLMs can appear to reason well, but a single wrong token can derail the whole output. Our new work shows that token-level memorization is a key cause of failure, especially under distribution shift. Introducing: STIM 🔍🧠 🧵 #NLProc
11,87K