-
Octopus v2: On-device language model for super agent
Paper • 2404.01744 • Published • 58 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 51 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55
Shaoguang Mao
dawnmsg
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 12 hours ago
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
liked
a model
about 12 hours ago
moonshotai/Kimi-K2.5
new activity
about 15 hours ago
moonshotai/Kimi-K2.5:Context Management Reproducibility | 可复现性 ?