Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -27,20 +27,28 @@ JoyAI-Image-Edit is a multimodal foundation model specialized in instruction-gui
|
|
| 27 |
|
| 28 |
**Requirements**: Python >= 3.10, CUDA-capable GPU
|
| 29 |
|
| 30 |
-
###
|
| 31 |
-
|
| 32 |
| Package | Version | Purpose |
|
| 33 |
|---------|---------|---------|
|
| 34 |
| `torch` | >= 2.8 | PyTorch |
|
| 35 |
| `transformers` | >= 4.57.0, < 4.58.0 | Text encoder |
|
|
|
|
|
|
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
###
|
| 39 |
```bash
|
| 40 |
-
pip install
|
|
|
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
-
###
|
| 44 |
```python
|
| 45 |
import torch
|
| 46 |
from PIL import Image
|
|
@@ -54,7 +62,7 @@ pipeline.set_progress_bar_config(disable=None)
|
|
| 54 |
print("pipeline loaded")
|
| 55 |
|
| 56 |
img_path = "./test_images/input.png"
|
| 57 |
-
prompt = "
|
| 58 |
|
| 59 |
image = Image.open(img_path).convert("RGB")
|
| 60 |
prompts = [f"<|im_start|>user\n<image>\n{prompt}<|im_end|>\n"]
|
|
@@ -103,8 +111,12 @@ Move the <object> into the red box and finally remove the red box.
|
|
| 103 |
**Example:**
|
| 104 |
|
| 105 |
```text
|
| 106 |
-
Move the
|
| 107 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
#### 2. Object Rotation
|
| 110 |
|
|
@@ -136,9 +148,13 @@ Rotate the <object> to show the <view> side view.
|
|
| 136 |
**Examples:**
|
| 137 |
|
| 138 |
```text
|
| 139 |
-
Rotate the
|
| 140 |
-
Rotate the car to show the rear left side view.
|
| 141 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
#### 3. Camera Control
|
| 144 |
|
|
@@ -175,10 +191,14 @@ Move the camera.
|
|
| 175 |
|
| 176 |
```text
|
| 177 |
Move the camera.
|
| 178 |
-
- Camera rotation: Yaw
|
| 179 |
- Camera zoom: unchanged.
|
| 180 |
- Keep the 3D scene static; only change the viewpoint.
|
| 181 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
| 183 |
## License Agreement
|
| 184 |
|
|
@@ -186,5 +206,5 @@ JoyAI-Image is licensed under Apache 2.0.
|
|
| 186 |
|
| 187 |
## ☎️ We're Hiring!
|
| 188 |
|
| 189 |
-
We are actively hiring Research Scientists, Engineers, and Interns to join us in building next-generation generative foundation models and bringing them into real-world applications. If you’re interested, please send your resume to: [huanghaoyang.ocean@jd.com](mailto:huanghaoyang.ocean@jd.com)
|
| 190 |
|
|
|
|
| 27 |
|
| 28 |
**Requirements**: Python >= 3.10, CUDA-capable GPU
|
| 29 |
|
| 30 |
+
### Core Dependencies
|
| 31 |
+
The transformers version must be **between 4.57 and 4.58**; otherwise, incorrect results may occur.
|
| 32 |
| Package | Version | Purpose |
|
| 33 |
|---------|---------|---------|
|
| 34 |
| `torch` | >= 2.8 | PyTorch |
|
| 35 |
| `transformers` | >= 4.57.0, < 4.58.0 | Text encoder |
|
| 36 |
+
| `torchvison` | - |Image process|
|
| 37 |
+
| `einops` | - |Tensor manipulation|
|
| 38 |
|
| 39 |
+
### Install the [Pull Request](https://github.com/huggingface/diffusers/pull/13444]) of JoyAI-Image-Edit of diffusers
|
| 40 |
+
```bash
|
| 41 |
+
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/13444/head
|
| 42 |
+
```
|
| 43 |
|
| 44 |
+
### Or install from this repo (PR will merge to diffusers main branch soon)
|
| 45 |
```bash
|
| 46 |
+
pip install torch==2.8 transformers==4.57.6 torchvision einops
|
| 47 |
+
|
| 48 |
+
pip install git+https://github.com/Moran232/diffusers.git@joyimage_edit
|
| 49 |
```
|
| 50 |
|
| 51 |
+
### Running with Diffusers
|
| 52 |
```python
|
| 53 |
import torch
|
| 54 |
from PIL import Image
|
|
|
|
| 62 |
print("pipeline loaded")
|
| 63 |
|
| 64 |
img_path = "./test_images/input.png"
|
| 65 |
+
prompt = "Move the board into the red box and finally remove the red box."
|
| 66 |
|
| 67 |
image = Image.open(img_path).convert("RGB")
|
| 68 |
prompts = [f"<|im_start|>user\n<image>\n{prompt}<|im_end|>\n"]
|
|
|
|
| 111 |
**Example:**
|
| 112 |
|
| 113 |
```text
|
| 114 |
+
Move the board into the red box and finally remove the red box.
|
| 115 |
```
|
| 116 |
+
<p align="center">
|
| 117 |
+
<img src="test_images/input1.png" width="40%" />
|
| 118 |
+
<img src="test_images/output1_predicted.png" width="40%" />
|
| 119 |
+
</p>
|
| 120 |
|
| 121 |
#### 2. Object Rotation
|
| 122 |
|
|
|
|
| 148 |
**Examples:**
|
| 149 |
|
| 150 |
```text
|
| 151 |
+
Rotate the dog to show the left side view.
|
|
|
|
| 152 |
```
|
| 153 |
+
<p align="center">
|
| 154 |
+
<img src="test_images/input2.png" width="40%" />
|
| 155 |
+
<img src="test_images/output2_predicted.png" width="40%" />
|
| 156 |
+
</p>
|
| 157 |
+
|
| 158 |
|
| 159 |
#### 3. Camera Control
|
| 160 |
|
|
|
|
| 191 |
|
| 192 |
```text
|
| 193 |
Move the camera.
|
| 194 |
+
- Camera rotation: Yaw 0.0°, Pitch -15.0°.
|
| 195 |
- Camera zoom: unchanged.
|
| 196 |
- Keep the 3D scene static; only change the viewpoint.
|
| 197 |
```
|
| 198 |
+
<p align="center">
|
| 199 |
+
<img src="test_images/input3.png" width="40%" />
|
| 200 |
+
<img src="test_images/output3_predicted.png" width="40%" />
|
| 201 |
+
</p>
|
| 202 |
|
| 203 |
## License Agreement
|
| 204 |
|
|
|
|
| 206 |
|
| 207 |
## ☎️ We're Hiring!
|
| 208 |
|
| 209 |
+
We are actively hiring Research Scientists, AI Infra Engineers, and Interns to join us in building next-generation generative foundation models and bringing them into real-world applications. If you’re interested, please send your resume to: [huanghaoyang.ocean@jd.com](mailto:huanghaoyang.ocean@jd.com)
|
| 210 |
|