File size: 4,451 Bytes
f38ad1e b42868f f38ad1e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
frameworks: PyTorch
license: Apache License 2.0
tags: []
tasks:
- text-to-image-synthesis
base_model:
- Qwen/Qwen-Image-Layered
base_model_relation: finetune
---
# Qwen-Image-Layered
## 模型介绍
本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。
更多关于训练策略和实现细节,欢迎查看我们的[技术博客](https://modelscope.cn/learn/4938)。
## 使用技巧
* 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
* 模型只用英文文本训练过,但仍从基础模型继承了中文理解能力
* 模型训练的原生分辨率是1024x1024,支持以其他分辨率进行推理
* 模型难以拆分“互相遮挡”的多个实体,例如样例中的卡通骷髅头和帽子
* 模型擅长拆分海报图层,不擅长拆分摄影图像,尤其是存在光影的照片
* 模型支持负向提示词,可以通过负向提示词描述不希望出现在结果的内容
## 效果展示
**部分图片为纯白色文本,魔搭社区用户请点击页面右上角的“☀︎”切换到暗色模式**
### 样例1
<div style="display: flex; justify-content: space-between;">
<div style="width: 30%;">
|输入图|
|-|
||
</div>
<div style="width: 66%;">
|提示词|输出图|提示词|输出图|
|-|-|-|-|
|A solid, uniform color with no distinguishable features or objects||Text 'TRICK'||
|Cloud||Text 'TRICK OR TREAT'||
|A cartoon skeleton character wearing a purple hat and holding a gift box||Text 'TRICK OR'||
|A purple hat and a head||A gift box||
</div>
</div>
### 样例2
<div style="display: flex; justify-content: space-between;">
<div style="width: 30%;">
|输入图|
|-|
||
</div>
<div style="width: 66%;">
|提示词|输出图|提示词|输出图|
|-|-|-|-|
|蓝天,白云,一片花园,花园里有五颜六色的花||五彩的精致花环||
|少女、花环、小猫||少女、小猫||
</div>
</div>
### 样例3
<div style="display: flex; justify-content: space-between;">
<div style="width: 30%;">
|输入图|
|-|
||
</div>
<div style="width: 66%;">
|提示词|输出图|提示词|输出图|
|-|-|-|-|
|一片湛蓝的天空和波涛汹涌的大海||文字“向往的生活”||
|一只海鸥||文字“生活”||
</div>
</div>
## 推理代码
安装 DiffSynth-Studio:
```
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
```
模型推理:
```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch, requests
pipe = QwenImagePipeline.from_pretrained(
torch_dtype=torch.bfloat16,
device="cuda",
model_configs=[
ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
],
processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
input_image.save("image_input.png")
images = pipe(
prompt,
seed=0,
num_inference_steps=30, cfg_scale=4,
height=1024, width=1024,
layer_input_image=input_image,
layer_num=0,
)
images[0].save("image.png")
```
|