File size: 4,451 Bytes
f38ad1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b42868f
 
 
f38ad1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
frameworks: PyTorch
license: Apache License 2.0
tags: []
tasks:
  - text-to-image-synthesis
base_model:
  - Qwen/Qwen-Image-Layered
base_model_relation: finetune
---
# Qwen-Image-Layered 

## 模型介绍

本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。


更多关于训练策略和实现细节,欢迎查看我们的[技术博客](https://modelscope.cn/learn/4938)。

## 使用技巧

* 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
* 模型只用英文文本训练过,但仍从基础模型继承了中文理解能力
* 模型训练的原生分辨率是1024x1024,支持以其他分辨率进行推理
* 模型难以拆分“互相遮挡”的多个实体,例如样例中的卡通骷髅头和帽子
* 模型擅长拆分海报图层,不擅长拆分摄影图像,尤其是存在光影的照片
* 模型支持负向提示词,可以通过负向提示词描述不希望出现在结果的内容

## 效果展示

**部分图片为纯白色文本,魔搭社区用户请点击页面右上角的“☀︎”切换到暗色模式**

### 样例1

<div style="display: flex; justify-content: space-between;">

<div style="width: 30%;">

|输入图|
|-|
|![](./assets/image_1_input.png)|

</div>

<div style="width: 66%;">

|提示词|输出图|提示词|输出图|
|-|-|-|-|
|A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)|
|Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)|
|A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)|
|A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|

</div>

</div>

### 样例2

<div style="display: flex; justify-content: space-between;">

<div style="width: 30%;">

|输入图|
|-|
|![](./assets/image_2_input.png)|

</div>

<div style="width: 66%;">

|提示词|输出图|提示词|输出图|
|-|-|-|-|
|蓝天,白云,一片花园,花园里有五颜六色的花|![](./assets/image_2_0_0.png)|五彩的精致花环|![](./assets/image_2_2_0.png)|
|少女、花环、小猫|![](./assets/image_2_1_0.png)|少女、小猫|![](./assets/image_2_3_0.png)|

</div>

</div>

### 样例3

<div style="display: flex; justify-content: space-between;">

<div style="width: 30%;">

|输入图|
|-|
|![](./assets/image_3_input.png)|

</div>

<div style="width: 66%;">

|提示词|输出图|提示词|输出图|
|-|-|-|-|
|一片湛蓝的天空和波涛汹涌的大海|![](./assets/image_3_0_0.png)|文字“向往的生活”|![](./assets/image_3_2_0.png)|
|一只海鸥|![](./assets/image_3_1_0.png)|文字“生活”|![](./assets/image_3_3_0.png)|

</div>

</div>

## 推理代码

安装 DiffSynth-Studio:

```
git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .
```

模型推理:

```python
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch, requests

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
input_image.save("image_input.png")
images = pipe(
    prompt,
    seed=0,
    num_inference_steps=30, cfg_scale=4,
    height=1024, width=1024,
    layer_input_image=input_image,
    layer_num=0,
)
images[0].save("image.png")
```