I only just started playing with it, but the model seems great so far. But it also has some implementation idiosyncrasies: - A new chat protocol - Only available in fp4 quantization - Attention sink which kind of breaks fused attention Open models move fast and I wonder how much time to invest in supporting these features? Will OpenAI open source more models?
Awni Hannun
Awni Hannun6.8. klo 12.43
OpenAI's new 120B MoE runs nicely in mlx-lm on an M3 Ultra. Running the 8-bit quant:
195