| | --- |
| | language: |
| | - en |
| | library_name: glyph-byt5 |
| | --- |
| | |
| | # Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering |
| |
|
| | We introduce **Glyph-ByT5-v2**, a customized text encoder for accurate **multilingual** visual text rendering and improved aesthetics. |
| | As an extension of **Glyph-SDXL**, our multilingual version supports visual text rendering for up to 10 different languages: English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese and Russian. |
| | Combined with SDXL, our proposed **Glyph-SDXL-v2** achieves accurate multilingual design image visual text rendering. |
| |
|
| |
|
| | > [**Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering**](https://glyph-byt5-v2.github.io/) |
| | > [Zeyu Liu](https://github.com/lzy-tony), [Weicong Liang](https://scholar.google.com/citations?user=QvHDIygAAAAJ&hl=zh-CN), [Yiming Zhao](https://scholar.google.com.hk/citations?user=_knPaYsAAAAJ&hl=zh-CN), [Bohan Chen](https://github.com/BHCHENGIT), [Ji Li](https://sites.google.com/a/usc.edu/jili/), [Yuhui Yuan](https://www.microsoft.com/en-us/research/people/yuyua/) |
| | > Microsoft Research Asia; Tsinghua University; Peking University; University of Liverpool |
| | > Preprint |
| |
|
| | ## Model Sources |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** [https://github.com/AIGText/Glyph-ByT5] |
| | - **Paper:** [https://arxiv.org/abs/2406.10208] |
| | - **Project Page:** [https://glyph-byt5-v2.github.io/] |
| |
|
| |
|
| | ## Model Description |
| |
|
| | Please check our [paper](https://arxiv.org/abs/2406.10208) and [project page](https://glyph-byt5-v2.github.io/) for more details. Detail usage and inference code can be found [here](https://github.com/AIGText/Glyph-ByT5). |
| |
|
| | ## Visualization |
| |
|
| | <table> |
| | <tr> |
| | <td><img src="assets/teaser/teaser_multilingual_1.webp" alt="example 1" width="200"/></td> |
| | <td><img src="assets/teaser/teaser_multilingual_2.webp" alt="example 2" width="200"/></td> |
| | <td><img src="assets/teaser/teaser_multilingual_3.webp" alt="example 3" width="200"/></td> |
| | <td><img src="assets/teaser/teaser_multilingual_4.webp" alt="example 4" width="200"/></td> |
| | </tr> |
| | </table> |
| | |
| | ## Quick Usage |
| |
|
| | ``` |
| | python inference_v2.py configs/glyph_sdxl_v2_albedo.py checkpoints examples/xiaoman.json --out_folder work_dirs/xiaoman --device cuda --sampler dpm |
| | ``` |
| |
|
| | ## More Configurations |
| |
|
| | We list some more useful configurations for easy usage: |
| |
|
| | | Argument/Config | Place | Default | Description | |
| | | ----------------------------- | ---------- | ----------------------------------- | ------------------------------------------------------------ | |
| | | cfg | argument | 5.0 | Classifier-free guidance | |
| | | sampler | argument | dpm | Sampler, provide support for dpm (DPM++ 2M Karras) and euler (EulerDiscreteScheduler) | |
| | | pretrained_model_name_or_path | config | stablediffusionapi/albedobase-xl-20 | Base model | |
| | | seed | annotation | None | Seed for inference | |
| |
|
| |
|
| | ## Citation |
| |
|
| | If you find our work useful in your research, please consider citing: |
| |
|
| | ``` |
| | @misc{liu2024glyphbyt5v2, |
| | title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, |
| | author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan}, |
| | year={2024}, |
| | eprint={2406.10208}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | ``` |
| |
|
| | and |
| |
|
| | ``` |
| | @misc{liu2024glyphbyt5, |
| | title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering}, |
| | author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan}, |
| | year={2024}, |
| | eprint={2403.09622}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | ``` |