[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
-
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Paper β’ 2506.18898 β’ Published β’ 34 -
Tar
π48Unified MLLM with Text-Aligned Representations
-
Tar
π3Unified MLLM with Text-Aligned Representations
-
Tar
π60Unified MLLM with Text-Aligned Representations