RETRO (DeepMind, 2021) is a beautiful idea, one badly in need of revisiting the central innovation of retro is to have a small model decide what token to predict next, but outsource all knowledge to a large offline datastore this has the added benefit of allowing you to insert and remove facts in a modular way by modifying the datastore, without retraining the model fits the ideal of a tiny model (karpathy’s cognitive core yada yada) really well. you could layer on more tools, too, just starting with a language datastore that’s the Most Important tool RETRO deserves much more recognition. especially now that small models have gotten so much better
38,4K