rankings Running on CPU Upgrade 7.06k MTEB Leaderboard 🥇 7.06k Embedding Leaderboard Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots Running 421 Reward Bench Leaderboard 📐 421 Explore RewardBench model rankings and scores
Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots
models for agents RLHFlow/ArmoRM-Llama3-8B-v0.1 Text Classification • Updated Sep 23, 2024 • 15.9k • 183 NexaAI/Octopus-v2 Text Generation • Updated May 21, 2024 • 437 • 890 NousResearch/Nous-Hermes-2-Yi-34B Text Generation • Updated Feb 20, 2024 • 8.13k • 254 NousResearch/OLMo-Bitnet-1B Text Generation • Updated Apr 11, 2024 • 22 • 119
datasets argilla/distilabel-intel-orca-dpo-pairs Viewer • Updated Aug 7, 2025 • 12.9k • 3.08k • 181 m-a-p/Code-Feedback Viewer • Updated Feb 26, 2024 • 66.4k • 1.23k • 229 argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 3.03k • 160 kyujinpy/orca_math_dpo Viewer • Updated Apr 12, 2024 • 15.3k • 64 • 19
argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 3.03k • 160
papers Self-Rewarding Language Models Paper • 2401.10020 • Published Jan 18, 2024 • 152 BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 106 ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published Apr 4, 2024 • 101 LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 260
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 106
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 260
rankings Running on CPU Upgrade 7.06k MTEB Leaderboard 🥇 7.06k Embedding Leaderboard Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots Running 421 Reward Bench Leaderboard 📐 421 Explore RewardBench model rankings and scores
Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots
datasets argilla/distilabel-intel-orca-dpo-pairs Viewer • Updated Aug 7, 2025 • 12.9k • 3.08k • 181 m-a-p/Code-Feedback Viewer • Updated Feb 26, 2024 • 66.4k • 1.23k • 229 argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 3.03k • 160 kyujinpy/orca_math_dpo Viewer • Updated Apr 12, 2024 • 15.3k • 64 • 19
argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 3.03k • 160
models for agents RLHFlow/ArmoRM-Llama3-8B-v0.1 Text Classification • Updated Sep 23, 2024 • 15.9k • 183 NexaAI/Octopus-v2 Text Generation • Updated May 21, 2024 • 437 • 890 NousResearch/Nous-Hermes-2-Yi-34B Text Generation • Updated Feb 20, 2024 • 8.13k • 254 NousResearch/OLMo-Bitnet-1B Text Generation • Updated Apr 11, 2024 • 22 • 119
papers Self-Rewarding Language Models Paper • 2401.10020 • Published Jan 18, 2024 • 152 BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 106 ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published Apr 4, 2024 • 101 LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 260
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 106
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 260