Meta just shared a preprint on using RL to post-train LLMs for generative ads on Facebook that drove up ad performance by 6.7%. • First RL-trained LLM deployed in Facebook’s ad system • Used ad click-through rates as the RL reward signal to fine-tune the ad text • RL model outperformed the supervised baseline on ad performance with a +6.7% CTR Metric-driven post-training at this scale opens doors for broader applications. Curious to see where this goes next.
3,85K