Ilya Pereverzin's picture

Ilya Pereverzin

NodeLinker

·

PlyMxt

AI & ML interests

Isn't it amazing that we let a computer think like a human?

Recent Activity

liked a model 3 days ago

IDEA-Research/grounding-dino-base

liked a model 3 days ago

Xiang12yu/dinov3-vitb16

liked a model 3 days ago

Fanqi-Lin-IR/dinov3_vitb16

View all activity

Organizations

upvoted a collection 3 days ago

Ming-V2

Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 14 items • Updated about 7 hours ago • 33

upvoted 2 papers 3 days ago

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 40

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Paper • 2506.09344 • Published Jun 11, 2025 • 30

upvoted a collection 4 days ago

DINOv3

DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 489

upvoted a paper 4 days ago

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 297

upvoted 3 papers 5 days ago

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Paper • 2408.00298 • Published Aug 1, 2024 • 11

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Paper • 2401.10224 • Published Jan 18, 2024 • 3

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

Paper • 2601.21957 • Published 16 days ago • 19

upvoted a paper 6 days ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 332

upvoted a paper 7 days ago

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 272

upvoted a collection 7 days ago

Qwen3-abliterated

32 items • Updated Dec 22, 2025 • 48

upvoted a paper 8 days ago

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Paper • 2601.14251 • Published 25 days ago • 24

upvoted 4 collections 8 days ago

PaddleOCR-VL

Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated 4 days ago • 28

PaddleOCR-VL-1.5

Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing • 6 items • Updated 4 days ago • 9

LightOnOCR-2 🦉

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 24 days ago • 22

Step-3.5-Flash

step 3.5 models • 6 items • Updated 3 days ago • 30

upvoted 2 papers 10 days ago

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 18 days ago • 42

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Paper • 2512.24618 • Published Dec 31, 2025 • 150

upvoted a collection 10 days ago

Youtu

13 items • Updated 10 days ago • 24

upvoted a paper 10 days ago

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

Paper • 2509.25916 • Published Sep 30, 2025 • 6