5 7 5

Alan

wizardII

wizard-III

AI & ML interests

RL & LLM

Recent Activity

upvoted a paper 16 days ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

new activity 3 months ago

deepseek-ai/DeepSeek-V3.2:Unbelievable

liked a model 3 months ago

deepseek-ai/DeepSeek-Math-V2

View all activity

Organizations

upvoted a paper 16 days ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published 21 days ago • 153

New activity in deepseek-ai/DeepSeek-V3.2 3 months ago

Unbelievable

❤️ 😎 15

#2 opened 3 months ago by

moxuewang

liked a model 3 months ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated Nov 27, 2025 • 6.11k • 681

updated a collection 4 months ago

Archer2.0

Collection

5 items • Updated Oct 8, 2025 • 1

updated a model 4 months ago

Fate-Zero/Archer2.0-Code-1.5B-Preview

2B • Updated Oct 8, 2025 • 3 • 3

upvoted a paper 4 months ago

ASPO: Asymmetric Importance Sampling Policy Optimization

Paper • 2510.06062 • Published Oct 7, 2025 • 14

upvoted 2 papers 5 months ago

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17

New activity in deepseek-ai/DeepSeek-R1-0528-Qwen3-8B 5 months ago

This model was distilled using only SFT, or through a combination of SFT and RL?

👍 1

#23 opened 5 months ago by

wizardII

upvoted a paper 5 months ago

Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92

updated a Space 5 months ago

README

👁

published a Space 5 months ago

README

👁

New activity in Fate-Zero/Archer2.0-Code-1.5B-Train_IS_Ratio 5 months ago

[bot] Conversion to Parquet

#1 opened 5 months ago by

parquet-converter

updated 2 datasets 5 months ago

Fate-Zero/Archer2.0-Math-1.5B

Viewer • Updated Sep 26, 2025 • 70.8k • 17

Fate-Zero/Archer2.0-Code-1.5B-Train_IS_Ratio

Viewer • Updated Sep 25, 2025 • 87.6k • 9

published a dataset 5 months ago

Fate-Zero/Archer2.0-Code-1.5B-Train_IS_Ratio

Viewer • Updated Sep 25, 2025 • 87.6k • 9

liked a model 5 months ago

Fate-Zero/Archer2.0-Code-1.5B-Preview

2B • Updated Oct 8, 2025 • 3 • 3

updated a dataset 5 months ago

Fate-Zero/Archer2.0-Code-1.5B

Viewer • Updated Sep 8, 2025 • 8.87k • 194 • 1

updated a collection 5 months ago

Archer2.0

Collection

5 items • Updated Oct 8, 2025 • 1

published a dataset 5 months ago

Fate-Zero/Archer2.0-Math-1.5B

Viewer • Updated Sep 26, 2025 • 70.8k • 17

Alan

AI & ML interests

Recent Activity

Organizations

wizardII's activity

Unbelievable

This model was distilled using only SFT, or through a combination of SFT and RL?

README

README

[bot] Conversion to Parquet