IDEA-Research/grounding-dino-base Zero-Shot Object Detection • 0.2B • Updated May 12, 2024 • 2.04M • 159
camenduru/dinov3-vitl16-pretrain-lvd1689m Image Feature Extraction • 0.3B • Updated Dec 17, 2025 • 4.57k • 1
IBBI-bio/dinov3-vitl16-pretrain-lvd1689m Image Feature Extraction • 21.6M • Updated Aug 29, 2025 • 642 • 2
Ming-V2 Collection Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 11 items • Updated about 9 hours ago • 33
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28, 2025 • 40
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11, 2025 • 30
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 486
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1, 2024 • 11
The Manga Whisperer: Automatically Generating Transcriptions for Comics Paper • 2401.10224 • Published Jan 18, 2024 • 3