| | --- |
| | pipeline_tag: text-to-image |
| | inference: false |
| | library_name: tensorrt |
| | license: other |
| | license_name: stabilityai-ai-community |
| | license_link: LICENSE.md |
| | tags: |
| | - tensorrt |
| | - sd3.5-medium |
| | - text-to-image |
| | - onnx |
| | extra_gated_prompt: >- |
| | By clicking "Agree", you agree to the [License |
| | Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) |
| | and acknowledge Stability AI's [Privacy |
| | Policy](https://stability.ai/privacy-policy). |
| | extra_gated_fields: |
| | Name: text |
| | Email: text |
| | Country: country |
| | Organization or Affiliation: text |
| | Receive email updates and promotions on Stability AI products, services, and research?: |
| | type: select |
| | options: |
| | - 'Yes' |
| | - 'No' |
| | What do you intend to use the model for?: |
| | type: select |
| | options: |
| | - Research |
| | - Personal use |
| | - Creative Professional |
| | - Startup |
| | - Enterprise |
| | I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox |
| | language: |
| | - en |
| | --- |
| | |
| | # Stable Diffusion 3.5 Medium TensorRT |
| | ## Introduction |
| |
|
| | This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Medium**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. |
| |
|
| | Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision. |
| |
|
| |
|
| | ## Performance using TensorRT 10.13 |
| | #### Timings for 30 steps at 1024x1024 |
| |
|
| | | Accelerator | Precision | CLIP-G | CLIP-L | T5 | MMDiT x 30 | VAE Decoder | Total | |
| | |-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
| | | H100 | BF16 | 16.52 ms | 6.83 ms | 8.46 ms | 2358.34 ms | 72.58 ms | 2496.63 ms | |
| |
|
| |
|
| | ## Usage Example |
| | 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
| | ```shell |
| | git clone https://github.com/NVIDIA/TensorRT.git |
| | cd TensorRT |
| | git checkout release/sd35 |
| | docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash |
| | ``` |
| |
|
| |
|
| | 2. Install libraries and requirements |
| | ```shell |
| | cd demo/Diffusion |
| | python3 -m pip install --upgrade pip |
| | pip3 install -r requirements.txt |
| | python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 |
| | ``` |
| |
|
| | 3. Generate HuggingFace user access token |
| | To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) page. |
| | You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). |
| |
|
| | ```bash |
| | export HF_TOKEN=<your access token> |
| | ``` |
| |
|
| | 4. Perform TensorRT optimized inference: |
| |
|
| | - **Stable Diffusion 3.5 Medium in BF16 precision** |
| | |
| | ``` |
| | python3 demo_txt2img_sd35.py \ |
| | "a beautiful photograph of Mt. Fuji during cherry blossom" \ |
| | --version=3.5-medium \ |
| | --bf16 \ |
| | --download-onnx-models \ |
| | --denoising-steps=30 \ |
| | --guidance-scale 3.5 \ |
| | --build-static-batch \ |
| | --use-cuda-graph \ |
| | --hf-token=$HF_TOKEN |
| | ``` |
| | |
| |
|