Alan's picture

5 7 5

Alan

wizardII

·

wizard-III

AI & ML interests

RL & LLM

Recent Activity

upvoted a paper 15 days ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

new activity 3 months ago

deepseek-ai/DeepSeek-V3.2:Unbelievable

liked a model 3 months ago

deepseek-ai/DeepSeek-Math-V2

View all activity

Organizations

upvoted a paper 15 days ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published 19 days ago • 152

upvoted a paper 4 months ago

ASPO: Asymmetric Importance Sampling Policy Optimization

Paper • 2510.06062 • Published Oct 7, 2025 • 14

upvoted 3 papers 5 months ago

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92

upvoted a paper 6 months ago

SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7, 2025 • 47

upvoted a paper 7 months ago

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published Jul 21, 2025 • 21