SK Telecom + @AdaptiveML trained Gemma 3 4B with PPO obtaining impressive results, specially for a model of such size Learn more about how they did this
3,05K