Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 13 hours ago
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research
upvoted
a
paper
20 days ago
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
upvoted
a
paper
about 2 months ago
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Organizations
None yet