HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper • 2602.03560 • Published 11 days ago • 43
SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAM Text Generation • 23B • Updated 22 days ago • 42 • 6
utter-project/EuroMoE-2.6B-A0.6B-Instruct-2512 Text Generation • 3B • Updated 7 days ago • 140 • 6
mistralai/Voxtral-Mini-4B-Realtime-2602 Automatic Speech Recognition • Updated about 13 hours ago • 5.21k • 513
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 188