Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10, 2025 • 16
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings Paper • 2503.03008 • Published Mar 4, 2025 • 1
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14, 2025 • 70