RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Paper • 2506.04277 • Published Jun 4, 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing Paper • 2504.17828 • Published Apr 24, 2025
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding Paper • 2603.16124 • Published 25 days ago • 2
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 2 days ago • 72
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 5 days ago • 222
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 5 days ago • 29
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 2 days ago • 72
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding Paper • 2603.16124 • Published 25 days ago • 2
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published 24 days ago • 94
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published Jan 23 • 40
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Paper • 2601.18631 • Published Jan 26 • 48
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published Jan 28 • 112
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 148
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 215