The Arena-Hard eval environment is now merged to Atropos - enjoy scalable, flexible and modern evaluation with @lmsysorg's arena-hard benchmark that's very great at determining a wide range of capabilities in models. Ready to be an RL environment if you bring your own train set as well :) Learn more at
6,81K