arxiv:2602.12670
Xiangyi Li
xdotli
AI & ML interests
None yet
Recent Activity
upvoted a paper about 13 hours ago
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces upvoted a paper 5 days ago
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world
Markets? submitted
a paper
5 days ago
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks