FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Paper
โข
2601.03928
โข
Published
โข
17
This model was introduced in the paper:
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
| Model | Backbone | ๐ค HuggingFace |
|---|---|---|
| FocusUI-3B | Qwen2.5-VL-3B | https://huggingface.co/yyyang/FocusUI-3B |
| FocusUI-7B | Qwen2.5-VL-7B | https://huggingface.co/yyyang/FocusUI-7B |
| FocusUI-2B | Qwen3-VL-2B | https://huggingface.co/yyyang/FocusUI-Qwen3-VL-2B |
For the training and evaluation data, see FocusUI-Training-Data and UI-Grounding-Benchmarks.
@article{ouyang2025focusui,
title = {FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection},
author = {Ouyang, Mingyu and Lin, Kevin Qinghong and Shou, Mike Zheng and Ng, Hwee Tou},
year = {2025},
journal = {arXiv preprint},
}