Spaces:

elismasilva
/

flux-1-panorama

Running on Zero

App Files Files Community

elismasilva commited on 16 days ago

Commit

682ea96

1 Parent(s): 9c55707

update app

Browse files

Files changed (7) hide show

.gitignore +1 -0
README.md +152 -1
app.py +312 -187
flux_pipeline_mod.py +52 -118
infer.py +37 -29
requirements.txt +3 -1
requirements_local.txt +3 -1

.gitignore CHANGED Viewed

@@ -10,3 +10,4 @@ venv/
 .DS_Store
 .gradio
 download.py

 .DS_Store
 .gradio
 download.py
+outputs/

README.md CHANGED Viewed

@@ -11,4 +11,155 @@ license: apache-2.0
 short_description: Flux 1 Panorama
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: Flux 1 Panorama
 ---
+# Panorama FLUX 🏞️✨
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/elismasilva/flux-1-panorama) <!--- Replace with your final space link -->
+Create stunning, seamless panoramic images by combining multiple distinct scenes with the power of the **FLUX.1-schnell** model. This application uses an advanced "Mixture of Diffusers" tiling pipeline to generate high-resolution compositions from left, center, and right text prompts.
+![Example Panorama Image](https://i.imgur.com/example.png) <!--- Optional: Replace with a link to an example image you generated -->
+## What is Panorama FLUX?
+Panorama FLUX is a creative tool that leverages a sophisticated tiling mechanism to generate a single, wide-format image from three separate text prompts. Instead of stretching a single concept, you can describe different but related scenes for the left, center, and right portions of the image. The pipeline then intelligently generates each part and seamlessly blends them together.
+This is ideal for:
+*   **Creating expansive landscapes:** Describe a beach that transitions into an ocean, which then meets a distant jungle.
+*   **Composing complex scenes:** Place different characters or objects side-by-side in a shared environment.
+*   **Generating ultra-wide art:** Create unique, high-resolution images perfect for wallpapers or digital art.
+The core technology uses a custom `FluxMoDTilingPipeline` built on the Diffusers library, specifically adapted for the **FLUX.1-schnell** model's "Embedded Guidance" mechanism for fast, high-quality results.
+### Key Features
+*   **Multi-Prompt Composition:** Control the left, center, and right of your image with unique prompts.
+*   **Seamless Stitching:** Uses advanced blending methods (Cosine or Gaussian) to eliminate visible seams between tiles.
+*   **High-Resolution Output:** Generates images far wider than what a standard pipeline can handle in a single pass.
+*   **Efficient Memory Management:** Integrates `mmgp` for local use on consumer GPUs and supports standard `diffusers` offloading for cloud environments via the `USE_MMGP` environment variable.
+*   **Optimized for FLUX.1-schnell:** Tailored to the 4-step inference and `guidance_scale=0.0` architecture of the distilled FLUX model.
+---
+## Running the App Locally
+Follow these steps to run the Gradio application on your own machine.
+### 1. Prerequisites
+*   Python 3.9+
+*   Git and Git LFS installed (`git-lfs` is required to clone large model files).
+### 2. Clone the Repository
+```bash
+git clone https://huggingface.co/spaces/elismasilva/flux-1-panorama
+cd flux-1-panorama
+```
+### 3. Set Up a Virtual Environment (Recommended)
+```bash
+# Windows
+python -m venv venv
+.\venv\Scripts\activate
+# macOS / Linux
+python3 -m venv venv
+source venv/bin/activate
+```
+### 4. Install Dependencies
+This project includes a specific requirements file for local execution.
+```bash
+pip install -r requirements_local.txt
+```
+### 5. Configure the Model Path
+By default, the app is configured to load the model from the Hugging Face Hub (`"black-forest-labs/FLUX.1-schnell"`). If you have downloaded the model locally (e.g., to `F:\models\flux_schnell`), you need to update the path in `app.py`.
+Open `app.py` and modify this line:
+```python
+# app.py - Line 26 (approximately)
+pipe = FluxMoDTilingPipeline.from_pretrained(
+    "path/to/your/local/model", # <-- CHANGE THIS
+    torch_dtype=torch.bfloat16
+).to("cuda")
+```
+### 6. Run the Gradio App
+```bash
+python app.py
+```
+The application will start and provide a local URL (usually `http://127.0.0.1:7860`) that you can open in your web browser.
+---
+## Using the Command-Line Script (`infer.py`)
+The `infer.py` script is a great way to test the pipeline directly, without the Gradio interface. This is useful for debugging, checking performance, and ensuring everything works correctly.
+### 1. Configure the Script
+Open the `infer.py` file in a text editor. You can modify the parameters inside the `main()` function to match your desired output.
+```python
+# infer.py
+# ... (imports)
+def main():
+    # --- 1. Load Model ---
+    MODEL_PATH = "black-forest-labs/FLUX.1-schnell" # Or your local path
+    # ... (model loading code)
+    # --- 2. Set Up Inference Parameters ---
+    prompt_grid = [[
+        "Your left prompt here.",
+        "Your center prompt here.",
+        "Your right prompt here."
+    ]]
+    target_height = 1024
+    target_width = 3072
+    # ... and so on for other parameters like steps, seed, etc.
+```
+### 2. Run the Script
+Execute the script from your terminal:
+```bash
+python infer.py
+```
+The script will print its progress to the console, including the `tqdm` progress bar, and save the final image as `inference_output_schnell.png` in the project directory.
+---
+## Environment Variables
+### `USE_MMGP`
+This variable controls which memory optimization strategy to use.
+*   **To use `mmgp` (Recommended for local use):**
+    Ensure the variable is **not set**, or set it to `true`. This is the default behavior.
+    ```bash
+    # (No action needed, or run)
+    # Linux/macOS: export USE_MMGP=true
+    # Windows CMD: set USE_MMGP=true
+    python app.py
+    ```
+*   **To disable `mmgp` and use standard `diffusers` CPU offloading (For Hugging Face Spaces or troubleshooting):**
+    Set the variable to `false`.
+    ```bash
+    # Linux/macOS
+    USE_MMGP=false python app.py
+    # Windows CMD
+    set USE_MMGP=false
+    python app.py
+    # Windows PowerShell
+    $env:USE_MMGP="false"
+    python app.py
+    ```
+## Acknowledgements
+*   **Black Forest Labs** for the powerful FLUX models.
+*   The original authors of the **Mixture of Diffusers** technique.
+*   **Hugging Face** for the `diffusers` library.

app.py CHANGED Viewed

@@ -4,64 +4,103 @@ import os
 import random
 import numpy as np
 import torch
-# Import the corrected unified pipeline
-from flux_pipeline_mod import FluxMoDTilingPipeline
-# 1. Conditional MMGP Setup based on Environment Variable
-# Check the 'USE_MMGP' environment variable. Default to 'true' if not set.
-USE_MMGP_ENV = os.getenv('USE_MMGP', 'true').lower()
-if USE_MMGP_ENV in ('false', '0', 'no', 'none'):
-    USE_MMGP = False
-    print("INFO: USE_MMGP environment variable set to false. MMGP will NOT be used.")
-else:
-    USE_MMGP = True
-    print("INFO: USE_MMGP is true or not set. Attempting to use MMGP.")
-# Conditionally import mmgp
 offload = None
 if USE_MMGP:
     try:
         from mmgp import offload, profile_type
         print("Successfully imported MMGP.")
     except ImportError:
-        print("WARNING: USE_MMGP is true, but the 'mmgp' library could not be found. Falling back to standard offload.")
-        USE_MMGP = False # Update flag as it can't be used
 MAX_SEED = np.iinfo(np.int32).max
-# 2. Load the Pipeline
-print("Loading the FLUX Tiling pipeline. This may take a moment...")
-pipe = FluxMoDTilingPipeline.from_pretrained(
-    #"F:\\models\\flux_dev",
-    "black-forest-labs/FLUX.1-schnell",
-    torch_dtype=torch.bfloat16
-).to("cuda")
-# 3. Apply Memory Optimization based on the flag
 if USE_MMGP and offload:
     print("Applying LowRAM_LowVRAM offload profile via MMGP...")
     offload.profile(pipe, profile_type.LowRAM_LowVRAM)
 else:
-    print("MMGP is disabled. Attempting to use the standard Diffusers CPU offload...")
     try:
         pipe.enable_model_cpu_offload()
     except Exception as e:
         print(f"Could not apply standard offload: {e}")
-#pipe.enable_vae_tiling()
-#pipe.enable_vae_slicing()
 print("Pipeline loaded and ready.")
 def create_hdr_effect(image, hdr_strength):
     if hdr_strength == 0:
         return image
     from PIL import ImageEnhance, Image
-    if isinstance(image, Image.Image): image = np.array(image)
     from scipy.ndimage import gaussian_filter
     blurred = gaussian_filter(image, sigma=5)
-    sharpened = np.clip(image + hdr_strength * (image - blurred), 0, 255).astype(np.uint8)
     pil_img = Image.fromarray(sharpened)
     converter = ImageEnhance.Color(pil_img)
     return converter.enhance(1 + hdr_strength)
@@ -69,25 +108,38 @@ def create_hdr_effect(image, hdr_strength):
 @spaces.GPU(duration=120)
 def predict(
-    left_prompt, center_prompt, right_prompt, negative_prompt,
-    left_gs, center_gs, right_gs, overlap_pixels, steps,
-    generation_seed, tile_weighting_method,
-    _, __,
-    target_height, target_width, hdr,
     progress=gr.Progress(track_tqdm=True),
 ):
     global pipe
-    generator_device = "cpu"
-    generator = torch.Generator(generator_device).manual_seed(generation_seed)
     final_height, final_width = int(target_height), int(target_width)
-    print("Starting generation with Unified Tiling Pipeline (Composition Mode)...")
     image = pipe(
-        prompt=[[left_prompt, center_prompt, right_prompt]],
         height=final_height,
         width=final_width,
-        negative_prompt=negative_prompt,
         tile_overlap=overlap_pixels,
         guidance_scale_tiles=[[left_gs, center_gs, right_gs]],
         tile_weighting_method=tile_weighting_method,
@@ -98,22 +150,16 @@ def predict(
     return create_hdr_effect(image, hdr)
 def do_calc_tile(target_height, target_width, overlap_pixels):
     num_cols = 3
     num_rows = 1
     tile_width = (target_width + (num_cols - 1) * overlap_pixels) // num_cols
     tile_height = (target_height + (num_rows - 1) * overlap_pixels) // num_rows
     tile_width -= tile_width % 16
     tile_height -= tile_height % 16
     final_width = tile_width * num_cols - (num_cols - 1) * overlap_pixels
     final_height = tile_height * num_rows - (num_rows - 1) * overlap_pixels
-    print("--- UI Tile Size Preview ---")
-    print(f"Ideal Tile Height/Width: {tile_height}/{tile_width}")
-    print(f"Calculated Final Height/Width: {final_height}/{final_width}\n")
     return (
         gr.update(value=tile_height),
         gr.update(value=tile_width),
@@ -121,122 +167,65 @@ def do_calc_tile(target_height, target_width, overlap_pixels):
         gr.update(value=final_width),
     )
 def clear_result():
     return gr.update(value=None)
 def run_for_examples(
-    left_prompt, center_prompt, right_prompt, negative_prompt,
-    left_gs, center_gs, right_gs, overlap_pixels, steps,
-    generation_seed, tile_weighting_method, tile_height, tile_width,
-    target_height, target_width, hdr,
 ):
     return predict(
-        left_prompt, center_prompt, right_prompt, negative_prompt,
-        left_gs, center_gs, right_gs, overlap_pixels, steps,
-        generation_seed, tile_weighting_method, tile_height, tile_width,
-        target_height, target_width, hdr,
     )
 def randomize_seed_fn(generation_seed: int, randomize_seed: bool) -> int:
     if randomize_seed:
         generation_seed = random.randint(0, MAX_SEED)
     return generation_seed
 # UI Layout
-css = "..."
-title = "..."
-# theme = gr.themes.Default(
-#     primary_hue='indigo',
-#     secondary_hue='cyan',
-#     neutral_hue='gray'
-# ).set(
-#     body_background_fill='*neutral_100',
-#     body_background_fill_dark='*neutral_900',
-#     body_text_color='*neutral_900',
-#     body_text_color_dark='*neutral_100',
-#     input_background_fill='white',
-#     input_background_fill_dark='*neutral_800',
-#     button_primary_background_fill='*primary_500',
-#     button_primary_background_fill_dark='*primary_700',
-#     button_primary_text_color='white',
-#     button_primary_text_color_dark='white',
-#     button_secondary_background_fill='*secondary_500',
-#     button_secondary_background_fill_dark='*secondary_700',
-#     button_secondary_text_color='white',
-#     button_secondary_text_color_dark='white'
-# )
 theme = gr.themes.Default(
-    primary_hue='blue',
-    secondary_hue='teal',
-    neutral_hue='neutral'
-).set(
-    body_background_fill='*neutral_100',
-    body_background_fill_dark='*neutral_900',
-    body_text_color='*neutral_700',
-    body_text_color_dark='*neutral_200',
-    body_text_weight='400',
-    link_text_color='*primary_500',
-    link_text_color_dark='*primary_400',
-    code_background_fill='*neutral_100',
-    code_background_fill_dark='*neutral_800',
-    shadow_drop='0 1px 3px rgba(0,0,0,0.1)',
-    shadow_inset='inset 0 2px 4px rgba(0,0,0,0.05)',
-    block_background_fill='*neutral_50',
-    block_background_fill_dark='*neutral_700',
-    block_border_color='*neutral_200',
-    block_border_color_dark='*neutral_600',
-    block_border_width='1px',
-    block_border_width_dark='1px',
-    block_label_background_fill='*primary_50',
-    block_label_background_fill_dark='*primary_600',
-    block_label_text_color='*primary_600',
-    block_label_text_color_dark='*primary_50',
-    panel_background_fill='white',
-    panel_background_fill_dark='*neutral_800',
-    panel_border_color='*neutral_200',
-    panel_border_color_dark='*neutral_700',
-    panel_border_width='1px',
-    panel_border_width_dark='1px',
-    input_background_fill='white',
-    input_background_fill_dark='*neutral_800',
-    input_border_color='*neutral_300',
-    input_border_color_dark='*neutral_700',
-    slider_color='*primary_500',
-    slider_color_dark='*primary_400',
-    button_primary_background_fill='*primary_600',
-    button_primary_background_fill_dark='*primary_500',
-    button_primary_background_fill_hover='*primary_700',
-    button_primary_background_fill_hover_dark='*primary_400',
-    button_primary_border_color='transparent',
-    button_primary_border_color_dark='transparent',
-    button_primary_text_color='white',
-    button_primary_text_color_dark='white',
-    button_secondary_background_fill='*neutral_200',
-    button_secondary_background_fill_dark='*neutral_600',
-    button_secondary_background_fill_hover='*neutral_300',
-    button_secondary_background_fill_hover_dark='*neutral_500',
-    button_secondary_border_color='transparent',
-    button_secondary_border_color_dark='transparent',
-    button_secondary_text_color='*neutral_700',
-    button_secondary_text_color_dark='*neutral_200'
 )
-# css = """
-# body { font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; }
-# """
-css_code = ""
-try:
-    with open("./style.css", "r", encoding="utf-8") as f:
-        css_code += f.read() + "\n"
-except FileNotFoundError:
-    pass
-title = """<h1 align="center">Panorama FLUX - Mixture-of-Diffusers for FLUX ✨</h1>
            <div style="text-align: center;">
-                <span>An advanced tiling pipeline for creative composition and large-scale image generation with the FLUX model.</span>
            </div>
            """
-with gr.Blocks(css=css_code, theme=theme, title="Panorama FLUX") as app:
     gr.Markdown(title)
     with gr.Row():
         with gr.Column(scale=7):
@@ -245,94 +234,230 @@ with gr.Blocks(css=css_code, theme=theme, title="Panorama FLUX") as app:
                 with gr.Column(scale=1):
                     gr.Markdown("### Left Region")
                     left_prompt = gr.Textbox(lines=4, label="Prompt for left side")
-                    left_gs = gr.Slider(minimum=0, maximum=15, value=7, step=0.5, label="Left CFG scale")
                 with gr.Column(scale=1):
                     gr.Markdown("### Center Region")
                     center_prompt = gr.Textbox(lines=4, label="Prompt for the center")
-                    center_gs = gr.Slider(minimum=0, maximum=15, value=7, step=0.5, label="Center CFG scale")
                 with gr.Column(scale=1):
                     gr.Markdown("### Right Region")
                     right_prompt = gr.Textbox(lines=4, label="Prompt for right side")
-                    right_gs = gr.Slider(minimum=0, maximum=15, value=7, step=0.5, label="Right CFG scale")
             with gr.Row():
-                negative_prompt = gr.Textbox(
-                    lines=2,
-                    label="Negative prompt (for the whole image)",
-                    value="nsfw, lowres, bad anatomy, bad hands, duplicate, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, blurry",
                 )
-            with gr.Row():
-                result = gr.Image(label="Generated Image", show_label=True, format="png", interactive=False)
     with gr.Sidebar():
         gr.Markdown("### Tiling & Generation Parameters")
         with gr.Row():
-            height = gr.Slider(label="Target Height", value=1024, step=16, visible=True, minimum=512, maximum=2048)
-            width = gr.Slider(label="Target Width", value=3072, step=16, visible=True, minimum=512, maximum=4096)
         with gr.Row():
-            overlap = gr.Slider(minimum=0, maximum=512, value=256, step=16, label="Tile Overlap")
-            tile_weighting_method = gr.Dropdown(label="Blending Method", choices=["Cosine", "Gaussian"], value="Cosine")
         with gr.Row():
-            calc_tile = gr.Button("Calculate Final Dimensions")
         with gr.Row():
-            new_target_height = gr.Textbox(label="Actual Image Height", value=1024, interactive=False)
-            new_target_width = gr.Textbox(label="Actual Image Width", value=3072, interactive=False)
         with gr.Row():
-            tile_height = gr.Textbox(label="Ideal Tile Height", value=1024, interactive=False)
-            tile_width = gr.Textbox(label="Ideal Tile Width", value=1152, interactive=False)
         with gr.Row():
-            steps = gr.Slider(minimum=4, maximum=50, value=28, step=1, label="Inference Steps")
         with gr.Row():
-            generation_seed = gr.Slider(label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=0)
             randomize_seed = gr.Checkbox(label="Randomize Seed", value=True)
         with gr.Row():
-            hdr = gr.Slider(minimum=0, maximum=1, value=0.1, step=0.05, label="HDR Effect")
     with gr.Row():
         gr.Examples(
             examples=[
-                 [
-                    "Iron Man, repulsor rays blasting enemies in destroyed cityscape, cinematic lighting, photorealistic. Focus: Iron Man.",
-                    "Captain America charging forward, vibranium shield deflecting energy blasts in destroyed cityscape, cinematic composition. Focus: Captain America.",
-                    "Thor wielding Stormbreaker in destroyed cityscape, lightning crackling, powerful strike downwards, cinematic photography. Focus: Thor.",
-                    negative_prompt.value, 5, 5, 5, 160, 30, 619517442, "Cosine", 1024, 1152, 1024, 3072, 0,
                 ],
                 [
-                    "A charming house in the countryside, by jakub rozalski, sunset lighting, elegant, highly detailed, sharp focus, artstation, stunning masterpiece",
-                    "A dirt road in the countryside crossing pastures, by jakub rozalski, sunset lighting, elegant, highly detailed, sharp focus, artstation, stunning masterpiece",
-                    "An old and rusty giant robot lying on a dirt road, by jakub rozalski, dark sunset lighting, elegant, highly detailed, sharp focus, artstation, stunning masterpiece",
-                    negative_prompt.value, 7, 7, 7, 256, 28, 358867853, "Gaussian", 1024, 1152, 1024, 2944, 0.1,
                 ],
             ],
             inputs=[
-                left_prompt, center_prompt, right_prompt, negative_prompt,
-                left_gs, center_gs, right_gs, overlap, steps,
-                generation_seed, tile_weighting_method, tile_height, tile_width, height, width, hdr,
             ],
             fn=run_for_examples,
             outputs=result,
             cache_examples=False,
         )
-    # Event handling
     event_calc_tile_size = {
         "fn": do_calc_tile,
         "inputs": [height, width, overlap],
         "outputs": [tile_height, tile_width, new_target_height, new_target_width],
     }
     predict_inputs = [
-        left_prompt, center_prompt, right_prompt, negative_prompt,
-        left_gs, center_gs, right_gs, overlap, steps,
-        generation_seed, tile_weighting_method, tile_height, tile_width,
-        new_target_height, new_target_width, hdr,
     ]
     calc_tile.click(**event_calc_tile_size)
     generate_button.click(
-        fn=clear_result, inputs=None, outputs=result, queue=False,
     ).then(**event_calc_tile_size).then(
-        fn=randomize_seed_fn, inputs=[generation_seed, randomize_seed], outputs=generation_seed, queue=False,
     ).then(
-        fn=predict, inputs=predict_inputs, outputs=result, show_progress='full'
     )
 app.queue().launch(share=True)

 import random
 import numpy as np
 import torch
+from transformers import pipeline
+# Import the pipeline
+from flux_pipeline_mod import FluxMoDTilingPipeline
+# 1. Load Translation Models ---
+# These models are small and run efficiently on CPU.
+print("Loading translation models...")
+try:
+    ko_en_translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
+    zh_en_translator = pipeline("translation", model="Helsinki-NLP/opus-mt-zh-en")
+    print("Translation models loaded successfully.")
+except Exception as e:
+    print(f"Could not load translation models: {e}")
+    ko_en_translator = None
+    zh_en_translator = None
+# 2. Conditional MMGP Setup ---
+USE_MMGP_ENV = os.getenv("USE_MMGP", "true").lower()
+USE_MMGP = USE_MMGP_ENV not in ("false", "0", "no", "none")
 offload = None
 if USE_MMGP:
+    print("INFO: Attempting to use MMGP.")
     try:
         from mmgp import offload, profile_type
         print("Successfully imported MMGP.")
     except ImportError:
+        print("WARNING: MMGP import failed. Falling back to standard offload.")
+        USE_MMGP = False
+else:
+    print("INFO: MMGP is disabled.")
 MAX_SEED = np.iinfo(np.int32).max
+# 3. Load the Main Pipeline ---
+print("Loading the FLUX Tiling pipeline...")
+# Use an environment variable for the model path to make it flexible
+MODEL_PATH = os.getenv("MODEL_PATH", "black-forest-labs/FLUX.1-schnell")
+print(f"Loading model from: {MODEL_PATH}")
+pipe = FluxMoDTilingPipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16).to(
+    "cuda"
+)
 if USE_MMGP and offload:
     print("Applying LowRAM_LowVRAM offload profile via MMGP...")
     offload.profile(pipe, profile_type.LowRAM_LowVRAM)
 else:
+    print("Attempting to use the standard Diffusers CPU offload...")
     try:
         pipe.enable_model_cpu_offload()
     except Exception as e:
         print(f"Could not apply standard offload: {e}")
 print("Pipeline loaded and ready.")
+# Helper Functions
+def translate_prompt(text: str, language: str) -> str:
+    """Translates text to English if the selected language is not English."""
+    if language == "English" or not text.strip():
+        return text
+    translated_text = text
+    if language == "Korean" and ko_en_translator:
+        if any(
+            "\uac00" <= char <= "\ud7a3" for char in text
+        ):  # Check if Korean characters are present
+            print(f"Translating Korean to English: '{text}'")
+            translated_text = ko_en_translator(text)[0]["translation_text"]
+            print(f" -> Translated: '{translated_text}'")
+    elif language == "Chinese" and zh_en_translator:
+        if any(
+            "\u4e00" <= char <= "\u9fff" for char in text
+        ):  # Check if Chinese characters are present
+            print(f"Translating Chinese to English: '{text}'")
+            translated_text = zh_en_translator(text)[0]["translation_text"]
+            print(f" -> Translated: '{translated_text}'")
+    return translated_text
 def create_hdr_effect(image, hdr_strength):
     if hdr_strength == 0:
         return image
     from PIL import ImageEnhance, Image
+    if isinstance(image, Image.Image):
+        image = np.array(image)
     from scipy.ndimage import gaussian_filter
     blurred = gaussian_filter(image, sigma=5)
+    sharpened = np.clip(image + hdr_strength * (image - blurred), 0, 255).astype(
+        np.uint8
+    )
     pil_img = Image.fromarray(sharpened)
     converter = ImageEnhance.Color(pil_img)
     return converter.enhance(1 + hdr_strength)
 @spaces.GPU(duration=120)
 def predict(
+    left_prompt,
+    center_prompt,
+    right_prompt,
+    left_gs,
+    center_gs,
+    right_gs,
+    overlap_pixels,
+    steps,
+    generation_seed,
+    tile_weighting_method,
+    prompt_language,
+    _,
+    __,
+    target_height,
+    target_width,
+    hdr,
     progress=gr.Progress(track_tqdm=True),
 ):
     global pipe
+    generator = torch.Generator("cuda").manual_seed(generation_seed)
     final_height, final_width = int(target_height), int(target_width)
+    # Translate prompts if necessary
+    translated_left = translate_prompt(left_prompt, prompt_language)
+    translated_center = translate_prompt(center_prompt, prompt_language)
+    translated_right = translate_prompt(right_prompt, prompt_language)
+    print("Starting generation with Tiling Pipeline (Composition Mode)...")
     image = pipe(
+        prompt=[[translated_left, translated_center, translated_right]],
         height=final_height,
         width=final_width,
         tile_overlap=overlap_pixels,
         guidance_scale_tiles=[[left_gs, center_gs, right_gs]],
         tile_weighting_method=tile_weighting_method,
     return create_hdr_effect(image, hdr)
 def do_calc_tile(target_height, target_width, overlap_pixels):
     num_cols = 3
     num_rows = 1
     tile_width = (target_width + (num_cols - 1) * overlap_pixels) // num_cols
     tile_height = (target_height + (num_rows - 1) * overlap_pixels) // num_rows
     tile_width -= tile_width % 16
     tile_height -= tile_height % 16
     final_width = tile_width * num_cols - (num_cols - 1) * overlap_pixels
     final_height = tile_height * num_rows - (num_rows - 1) * overlap_pixels
     return (
         gr.update(value=tile_height),
         gr.update(value=tile_width),
         gr.update(value=final_width),
     )
 def clear_result():
     return gr.update(value=None)
 def run_for_examples(
+    left_prompt,
+    center_prompt,
+    right_prompt,
+    left_gs,
+    center_gs,
+    right_gs,
+    overlap_pixels,
+    steps,
+    generation_seed,
+    tile_weighting_method,
+    tile_height,
+    tile_width,
+    target_height,
+    target_width,
+    hdr,
 ):
     return predict(
+        left_prompt,
+        center_prompt,
+        right_prompt,
+        left_gs,
+        center_gs,
+        right_gs,
+        overlap_pixels,
+        steps,
+        generation_seed,
+        tile_weighting_method,
+        "English",
+        tile_height,
+        tile_width,
+        target_height,
+        target_width,
+        hdr,
     )
 def randomize_seed_fn(generation_seed: int, randomize_seed: bool) -> int:
     if randomize_seed:
         generation_seed = random.randint(0, MAX_SEED)
     return generation_seed
 # UI Layout
 theme = gr.themes.Default(
+    primary_hue="blue", secondary_hue="teal", neutral_hue="neutral"
 )
+title = """<h1 align="center">Panorama FLUX 🏞️✨</h1>
            <div style="text-align: center;">
+                <span>An advanced tiling pipeline for creative composition and large-scale image generation with the FLUX.1-schnell model.</span>
            </div>
            """
+with gr.Blocks(theme=theme, title="Panorama FLUX") as app:
     gr.Markdown(title)
     with gr.Row():
         with gr.Column(scale=7):
                 with gr.Column(scale=1):
                     gr.Markdown("### Left Region")
                     left_prompt = gr.Textbox(lines=4, label="Prompt for left side")
+                    left_gs = gr.Slider(
+                        minimum=0.0,
+                        maximum=10.0,
+                        value=0.0,
+                        step=0.1,
+                        label="Left Guidance",
+                    )
                 with gr.Column(scale=1):
                     gr.Markdown("### Center Region")
                     center_prompt = gr.Textbox(lines=4, label="Prompt for the center")
+                    center_gs = gr.Slider(
+                        minimum=0.0,
+                        maximum=10.0,
+                        value=0.0,
+                        step=0.1,
+                        label="Center Guidance",
+                    )
                 with gr.Column(scale=1):
                     gr.Markdown("### Right Region")
                     right_prompt = gr.Textbox(lines=4, label="Prompt for right side")
+                    right_gs = gr.Slider(
+                        minimum=0.0,
+                        maximum=10.0,
+                        value=0.0,
+                        step=0.1,
+                        label="Right Guidance",
+                    )
             with gr.Row():
+                result = gr.Image(
+                    label="Generated Image",
+                    show_label=True,
+                    format="png",
+                    interactive=False,
                 )
     with gr.Sidebar():
         gr.Markdown("### Tiling & Generation Parameters")
+        # New Language Selector
+        prompt_language = gr.Radio(
+            choices=["English", "Korean", "Chinese"],
+            value="English",
+            label="Prompt Language",
+            info="Select the language you will type your prompts in.",
+        )
         with gr.Row():
+            height = gr.Slider(
+                label="Target Height", value=1024, step=16, minimum=512, maximum=2048
+            )
+            width = gr.Slider(
+                label="Target Width", value=3072, step=16, minimum=512, maximum=4096
+            )
         with gr.Row():
+            overlap = gr.Slider(
+                minimum=0, maximum=512, value=256, step=16, label="Tile Overlap"
+            )
+            tile_weighting_method = gr.Dropdown(
+                label="Blending Method", choices=["Cosine", "Gaussian"], value="Cosine"
+            )
         with gr.Row():
+            calc_tile = gr.Button("Calculate Final Dimensions", variant="primary")
         with gr.Row():
+            new_target_height = gr.Textbox(
+                label="Actual Image Height", value=1024, interactive=False
+            )
+            new_target_width = gr.Textbox(
+                label="Actual Image Width", value=3072, interactive=False
+            )
         with gr.Row():
+            tile_height = gr.Textbox(
+                label="Ideal Tile Height", value=1024, interactive=False
+            )
+            tile_width = gr.Textbox(
+                label="Ideal Tile Width", value=1152, interactive=False
+            )
         with gr.Row():
+            steps = gr.Slider(
+                minimum=1, maximum=10, value=4, step=1, label="Inference Steps"
+            )
         with gr.Row():
+            generation_seed = gr.Slider(
+                label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=0
+            )
             randomize_seed = gr.Checkbox(label="Randomize Seed", value=True)
         with gr.Row():
+            hdr = gr.Slider(
+                minimum=0.0, maximum=1.0, value=0.1, step=0.00, label="HDR Effect"
+            )
     with gr.Row():
         gr.Examples(
             examples=[
+                [
+                    "A vibrant medieval marketplace...",
+                    "A majestic stone castle...",
+                    "A dense, dark forest...",
+                    0.0,
+                    0.0,
+                    0.0,
+                    256,
+                    4,
+                    12345,
+                    "Cosine",
+                    1024,
+                    1152,
+                    1024,
+                    3072,
+                    0,
+                ],
+                [
+                    "A vibrant mountain slope in full spring bloom, covered in colorful wildflowers and lush green grass, a small stream meandering down, cinematic photo, bright morning light.",
+                    "The majestic, rocky peak of the same mountain under a clear summer sky, patches of green tundra, eagles soaring high above, strong midday sun. cinematic photo.",
+                    "The other side of the mountain descending into a valley ablaze with autumn colors, forests of red, orange, and yellow trees, a gentle haze in the air. cinematic photo, golden hour light.",
+                    0.0,
+                    0.0,
+                    0.0,
+                    280,
+                    4,
+                    20240521,
+                    "Cosine",
+                    1024,
+                    1152,
+                    1024,
+                    3072,
+                    0,
+                ],
+                [
+                    "A futuristic neon-lit city street...",
+                    "The entrance to a grimy nightclub...",
+                    "A dark alleyway off the main street...",
+                    3.5,
+                    3.5,
+                    3.5,
+                    300,
+                    8,
+                    98765,
+                    "Cosine",
+                    1024,
+                    1280,
+                    1024,
+                    3240,
+                    0,
                 ],
                 [
+                    "Iron Man, repulsor rays...",
+                    "Captain America charging forward...",
+                    "Thor wielding Stormbreaker...",
+                    0.0,
+                    0.0,
+                    0.0,
+                    160,
+                    4,
+                    619517442,
+                    "Cosine",
+                    1024,
+                    1152,
+                    1024,
+                    3072,
+                    0,
                 ],
             ],
             inputs=[
+                left_prompt,
+                center_prompt,
+                right_prompt,
+                left_gs,
+                center_gs,
+                right_gs,
+                overlap,
+                steps,
+                generation_seed,
+                tile_weighting_method,
+                tile_height,
+                tile_width,
+                height,
+                width,
+                hdr,
             ],
             fn=run_for_examples,
             outputs=result,
             cache_examples=False,
         )
+    # Event Handling
     event_calc_tile_size = {
         "fn": do_calc_tile,
         "inputs": [height, width, overlap],
         "outputs": [tile_height, tile_width, new_target_height, new_target_width],
     }
     predict_inputs = [
+        left_prompt,
+        center_prompt,
+        right_prompt,
+        left_gs,
+        center_gs,
+        right_gs,
+        overlap,
+        steps,
+        generation_seed,
+        tile_weighting_method,
+        prompt_language,
+        tile_height,
+        tile_width,
+        new_target_height,
+        new_target_width,
+        hdr,
     ]
     calc_tile.click(**event_calc_tile_size)
     generate_button.click(
+        fn=clear_result,
+        inputs=None,
+        outputs=result,
+        queue=False,
     ).then(**event_calc_tile_size).then(
+        fn=randomize_seed_fn,
+        inputs=[generation_seed, randomize_seed],
+        outputs=generation_seed,
+        queue=False,
     ).then(
+        fn=predict, inputs=predict_inputs, outputs=result, show_progress="full"
     )
 app.queue().launch(share=True)

flux_pipeline_mod.py CHANGED Viewed

@@ -24,40 +24,28 @@ from diffusers.pipelines.flux.pipeline_flux import FluxPipeline, FluxPipelineOut
 logger = logging.get_logger(__name__)
 def _adaptive_tile_size(image_size, base_tile_size=512, max_tile_size=1280):
-    width, height = image_size
-    aspect_ratio = width / height
     if aspect_ratio > 1:
-        tile_width = min(width, max_tile_size)
-        tile_height = min(int(tile_width / aspect_ratio), max_tile_size)
     else:
-        tile_height = min(height, max_tile_size)
-        tile_width = min(int(tile_height * aspect_ratio), max_tile_size)
-    tile_width = max(tile_width, base_tile_size)
-    tile_height = max(tile_height, base_tile_size)
-    return tile_width, tile_height
 def _calculate_tile_positions(image_dim: int, tile_dim: int, overlap: int) -> List[int]:
-    if image_dim <= tile_dim:
-        return [0]
-    positions = []
-    current_pos = 0
-    stride = tile_dim - overlap
     while True:
         positions.append(current_pos)
-        if current_pos + tile_dim >= image_dim:
-            break
         current_pos += stride
-        if current_pos > image_dim - tile_dim:
-            break
-    last_pos = positions[-1]
-    if last_pos + tile_dim < image_dim:
-        positions.append(image_dim - tile_dim)
     return sorted(list(set(positions)))
 def _tile2pixel_indices(tile_row_pos, tile_col_pos, tile_width, tile_height, image_width, image_height):
-    px_row_init = tile_row_pos
-    px_col_init = tile_col_pos
     px_row_end = min(px_row_init + tile_height, image_height)
     px_col_end = min(px_col_init + tile_width, image_width)
     return px_row_init, px_row_end, px_col_init, px_col_end
@@ -69,48 +57,34 @@ def release_memory(device):
     gc.collect()
     if torch.cuda.is_available():
         with torch.cuda.device(device):
-            torch.cuda.empty_cache()
-            torch.cuda.synchronize()
 class FluxMoDTilingPipeline(FluxPipeline):
     class TileWeightingMethod(Enum):
-        COSINE = "Cosine"
-        GAUSSIAN = "Gaussian"
-    def _generate_gaussian_weights(self, tile_width, tile_height, nbatches, device, dtype, sigma=0.05):
-        latent_width = tile_width // self.vae_scale_factor
-        latent_height = tile_height // self.vae_scale_factor
-        x = np.linspace(-1, 1, latent_width)
-        y = np.linspace(-1, 1, latent_height)
         xx, yy = np.meshgrid(x, y)
-        gaussian_weight = np.exp(-(xx**2 + yy**2) / (2 * sigma**2))
-        weights_torch = torch.tensor(gaussian_weight, device=device, dtype=dtype)
-        return torch.tile(weights_torch, (nbatches, self.transformer.config.in_channels // 4, 1, 1))
     def _generate_cosine_weights(self, tile_width, tile_height, nbatches, device, dtype):
-        latent_width = tile_width // self.vae_scale_factor
-        latent_height = tile_height // self.vae_scale_factor
-        x = np.arange(latent_width)
-        y = np.arange(latent_height)
-        mid_x = (latent_width - 1) / 2
-        mid_y = (latent_height - 1) / 2
-        x_probs = np.cos(np.pi * (x - mid_x) / latent_width)
-        y_probs = np.cos(np.pi * (y - mid_y) / latent_height)
-        weights_np = np.outer(y_probs, x_probs)
-        weights_torch = torch.tensor(weights_np, device=device, dtype=dtype)
-        return torch.tile(weights_torch, (nbatches, self.transformer.config.in_channels // 4, 1, 1))
-    def prepare_tiles_weights(
-        self, y_steps, x_steps, tile_height, tile_width, final_height, final_width,
-        tile_weighting_method, tile_gaussian_sigma, batch_size, device, dtype
-    ):
         tile_weights = np.empty((len(y_steps), len(x_steps)), dtype=object)
         for row, y_start in enumerate(y_steps):
             for col, x_start in enumerate(x_steps):
                 _, px_row_end, _, px_col_end = _tile2pixel_indices(y_start, x_start, tile_width, tile_height, final_width, final_height)
-                current_tile_h = px_row_end - y_start
-                current_tile_w = px_col_end - x_start
                 if tile_weighting_method == self.TileWeightingMethod.COSINE.value:
                     tile_weights[row, col] = self._generate_cosine_weights(current_tile_w, current_tile_h, batch_size, device, dtype)
                 else:
@@ -124,17 +98,17 @@ class FluxMoDTilingPipeline(FluxPipeline):
         height: int = 1024,
         width: int = 1024,
         negative_prompt: Optional[Union[str, List[List[str]]]] = "",
-        num_inference_steps: int = 28,
-        guidance_scale: float = 7.0,
         generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
         max_tile_size: int = 1024,
         tile_overlap: int = 256,
         tile_weighting_method: str = "Cosine",
-        tile_gaussian_sigma: float = 0.05,
         guidance_scale_tiles: Optional[List[List[float]]] = None,
         max_sequence_length: int = 512,
         output_type: Optional[str] = "pil",
-        return_dict: bool = True,
     ):
         device = self._execution_device
         batch_size = 1
@@ -146,42 +120,29 @@ class FluxMoDTilingPipeline(FluxPipeline):
             grid_rows, grid_cols = len(prompt), len(prompt[0])
             tile_width = (width + (grid_cols - 1) * tile_overlap) // grid_cols
             tile_height = (height + (grid_rows - 1) * tile_overlap) // grid_rows
-            tile_width -= tile_width % PIXEL_MULTIPLE
-            tile_height -= tile_height % PIXEL_MULTIPLE
             final_width = tile_width * grid_cols - (grid_cols - 1) * tile_overlap
             final_height = tile_height * grid_rows - (grid_rows - 1) * tile_overlap
-            stride_x = tile_width - tile_overlap
-            stride_y = tile_height - tile_overlap
-            x_steps = [i * stride_x for i in range(grid_cols)]
-            y_steps = [i * stride_y for i in range(grid_rows)]
-            logger.info(f"Prompt grid provided. Using fixed {grid_rows}x{grid_cols} grid.")
-            logger.info(f"Target resolution: {width}x{height}. Actual resolution: {final_width}x{final_height}.")
-        else:
             final_width, final_height = width, height
             tile_width, tile_height = _adaptive_tile_size((final_width, final_height), max_tile_size=max_tile_size)
-            tile_width -= tile_width % PIXEL_MULTIPLE
-            tile_height -= tile_height % PIXEL_MULTIPLE
             y_steps = _calculate_tile_positions(final_height, tile_height, tile_overlap)
             x_steps = _calculate_tile_positions(final_width, tile_width, tile_overlap)
             grid_rows, grid_cols = len(y_steps), len(x_steps)
         logger.info(f"Processing image in a {grid_rows}x{grid_cols} grid of tiles.")
-        if not isinstance(negative_prompt, list) or not all(isinstance(p, list) for p in negative_prompt):
-            negative_prompt = [[negative_prompt] * grid_cols for _ in range(grid_rows)]
         text_embeddings = []
         for r in range(grid_rows):
             row_embeddings = []
             for c in range(grid_cols):
                 p = prompt[r][c] if is_prompt_grid else prompt
-                np_ = negative_prompt[r][c] if is_prompt_grid else negative_prompt[0][0]
                 prompt_embeds, pooled, text_ids = self.encode_prompt(p, device=device, max_sequence_length=max_sequence_length)
-                neg_embeds, neg_pooled, neg_ids = self.encode_prompt(np_, device=device, max_sequence_length=max_sequence_length)
-                row_embeddings.append({
-                    "prompt_embeds": prompt_embeds, "pooled_prompt_embeds": pooled, "txt_ids": text_ids,
-                    "neg_prompt_embeds": neg_embeds, "neg_pooled_prompt_embeds": neg_pooled, "neg_txt_ids": neg_ids,
-                })
             text_embeddings.append(row_embeddings)
         prompt_dtype = text_embeddings[0][0]["prompt_embeds"].dtype
@@ -191,35 +152,21 @@ class FluxMoDTilingPipeline(FluxPipeline):
         latents = randn_tensor(latents_shape, generator=generator, device=device, dtype=prompt_dtype)
         image_seq_len = (tile_height // self.vae_scale_factor // 2) * (tile_width // self.vae_scale_factor // 2)
-        mu = calculate_shift(image_seq_len)
-        timesteps, _ = retrieve_timesteps(self.scheduler, num_inference_steps, device, mu=mu)
-        if self.transformer.config.guidance_embeds:
-            guidance = torch.tensor([guidance_scale], device=device)
-        else:
-            guidance = None
-        tile_weights = self.prepare_tiles_weights(
-            y_steps, x_steps, tile_height, tile_width, final_height, final_width,
-            tile_weighting_method, tile_gaussian_sigma, batch_size, device, latents.dtype
-        )
-        self.text_encoder.to("cpu");
-        self.text_encoder_2.to("cpu");
         release_memory(device)
         with self.progress_bar(total=num_inference_steps) as progress_bar:
             for i, t in enumerate(timesteps):
                 noise_preds_tiles = np.empty((grid_rows, grid_cols), dtype=object)
                 for r, y_start in enumerate(y_steps):
                     for c, x_start in enumerate(x_steps):
                         px_r_init, px_r_end, px_c_init, px_c_end = _tile2pixel_indices(y_start, x_start, tile_width, tile_height, final_width, final_height)
-                        # Store the PIXEL dimensions of the current tile
-                        current_tile_pixel_height = px_r_end - px_r_init
-                        current_tile_pixel_width = px_c_end - px_c_init
                         r_init, r_end, c_init, c_end = _tile2latent_indices(px_r_init, px_r_end, px_c_init, px_c_end, self.vae_scale_factor)
                         tile_latents = latents[:, :, r_init:r_end, c_init:c_end]
@@ -231,30 +178,19 @@ class FluxMoDTilingPipeline(FluxPipeline):
                         timestep = t.expand(b).to(packed_latents.dtype)
                         current_gs_value = guidance_scale_tiles[r][c] if (is_prompt_grid and guidance_scale_tiles) else guidance_scale
-                        current_guidance = torch.tensor([current_gs_value], device=device) if guidance is not None else None
-                        noise_pred_uncond_packed = self.transformer(
-                            hidden_states=packed_latents, timestep=timestep / 1000, guidance=current_guidance,
-                            pooled_projections=embeds["neg_pooled_prompt_embeds"],
-                            encoder_hidden_states=embeds["neg_prompt_embeds"],
-                            txt_ids=embeds["neg_txt_ids"], img_ids=latent_image_ids,
-                        )[0]
-                        noise_pred_text_packed = self.transformer(
                             hidden_states=packed_latents, timestep=timestep / 1000, guidance=current_guidance,
                             pooled_projections=embeds["pooled_prompt_embeds"],
                             encoder_hidden_states=embeds["prompt_embeds"],
                             txt_ids=embeds["txt_ids"], img_ids=latent_image_ids,
                         )[0]
-                        # Pass the correct PIXEL dimensions of the tile to _unpack_latents
-                        noise_pred_uncond = self._unpack_latents(noise_pred_uncond_packed, current_tile_pixel_height, current_tile_pixel_width, self.vae_scale_factor)
-                        noise_pred_text = self._unpack_latents(noise_pred_text_packed, current_tile_pixel_height, current_tile_pixel_width, self.vae_scale_factor)
-                        noise_pred_tile = noise_pred_uncond + current_gs_value * (noise_pred_text - noise_pred_uncond)
                         noise_preds_tiles[r, c] = noise_pred_tile
-                # Stitch noise predictions
                 noise_pred = torch.zeros_like(latents)
                 contributors = torch.zeros_like(latents)
                 for r, y_start in enumerate(y_steps):
@@ -268,19 +204,17 @@ class FluxMoDTilingPipeline(FluxPipeline):
                 latents_dtype = latents.dtype
                 latents = self.scheduler.step(noise_pred, t, latents)[0]
-                if latents.dtype != latents_dtype:
-                    latents = latents.to(latents_dtype)
                 progress_bar.update()
         # Post-processing
-        if output_type == "latent":
-            image = latents
         else:
             self.vae.to(device)
             latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor
             image = self.vae.decode(latents.to(self.vae.dtype))[0]
             image = self.image_processor.postprocess(image, output_type=output_type)
-        self.maybe_free_model_hooks()
         return FluxPipelineOutput(images=image)

 logger = logging.get_logger(__name__)
 def _adaptive_tile_size(image_size, base_tile_size=512, max_tile_size=1280):
+    width, height = image_size; aspect_ratio = width / height
     if aspect_ratio > 1:
+        tile_width = min(width, max_tile_size); tile_height = min(int(tile_width / aspect_ratio), max_tile_size)
     else:
+        tile_height = min(height, max_tile_size); tile_width = min(int(tile_height * aspect_ratio), max_tile_size)
+    return max(tile_width, base_tile_size), max(tile_height, base_tile_size)
 def _calculate_tile_positions(image_dim: int, tile_dim: int, overlap: int) -> List[int]:
+    if image_dim <= tile_dim: return [0]
+    positions = []; current_pos = 0; stride = tile_dim - overlap
     while True:
         positions.append(current_pos)
+        if current_pos + tile_dim >= image_dim: break
         current_pos += stride
+        if current_pos > image_dim - tile_dim: break
+    if positions[-1] + tile_dim < image_dim: positions.append(image_dim - tile_dim)
     return sorted(list(set(positions)))
 def _tile2pixel_indices(tile_row_pos, tile_col_pos, tile_width, tile_height, image_width, image_height):
+    px_row_init = tile_row_pos; px_col_init = tile_col_pos
     px_row_end = min(px_row_init + tile_height, image_height)
     px_col_end = min(px_col_init + tile_width, image_width)
     return px_row_init, px_row_end, px_col_init, px_col_end
     gc.collect()
     if torch.cuda.is_available():
         with torch.cuda.device(device):
+            torch.cuda.empty_cache(); torch.cuda.synchronize()
 class FluxMoDTilingPipeline(FluxPipeline):
     class TileWeightingMethod(Enum):
+        COSINE = "Cosine"; GAUSSIAN = "Gaussian"
+    def _generate_gaussian_weights(self, tile_width, tile_height, nbatches, device, dtype, sigma=0.4):
+        latent_width, latent_height = tile_width // self.vae_scale_factor, tile_height // self.vae_scale_factor
+        x, y = np.linspace(-1, 1, latent_width), np.linspace(-1, 1, latent_height)
         xx, yy = np.meshgrid(x, y)
+        gaussian_weight_np = np.exp(-(xx**2 + yy**2) / (2 * sigma**2))
+        weights_torch_f32 = torch.tensor(gaussian_weight_np, device=device, dtype=torch.float32)
+        weights_torch_target_dtype = weights_torch_f32.to(dtype)
+        return torch.tile(weights_torch_target_dtype, (nbatches, self.transformer.config.in_channels // 4, 1, 1))
     def _generate_cosine_weights(self, tile_width, tile_height, nbatches, device, dtype):
+        latent_width, latent_height = tile_width // self.vae_scale_factor, tile_height // self.vae_scale_factor
+        x, y = np.arange(latent_width), np.arange(latent_height)
+        mid_x, mid_y = (latent_width - 1) / 2, (latent_height - 1) / 2
+        x_probs, y_probs = np.cos(np.pi * (x - mid_x) / latent_width), np.cos(np.pi * (y - mid_y) / latent_height)
+        return torch.tile(torch.tensor(np.outer(y_probs, x_probs), device=device, dtype=dtype), (nbatches, self.transformer.config.in_channels // 4, 1, 1))
+    def prepare_tiles_weights(self, y_steps, x_steps, tile_height, tile_width, final_height, final_width, tile_weighting_method, tile_gaussian_sigma, batch_size, device, dtype):
         tile_weights = np.empty((len(y_steps), len(x_steps)), dtype=object)
         for row, y_start in enumerate(y_steps):
             for col, x_start in enumerate(x_steps):
                 _, px_row_end, _, px_col_end = _tile2pixel_indices(y_start, x_start, tile_width, tile_height, final_width, final_height)
+                current_tile_h, current_tile_w = px_row_end - y_start, px_col_end - x_start
                 if tile_weighting_method == self.TileWeightingMethod.COSINE.value:
                     tile_weights[row, col] = self._generate_cosine_weights(current_tile_w, current_tile_h, batch_size, device, dtype)
                 else:
         height: int = 1024,
         width: int = 1024,
         negative_prompt: Optional[Union[str, List[List[str]]]] = "",
+        num_inference_steps: int = 4,
+        guidance_scale: float = 0.0,
         generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
         max_tile_size: int = 1024,
         tile_overlap: int = 256,
         tile_weighting_method: str = "Cosine",
+        tile_gaussian_sigma: float = 0.4,
         guidance_scale_tiles: Optional[List[List[float]]] = None,
         max_sequence_length: int = 512,
         output_type: Optional[str] = "pil",
+        return_dict: bool = True,
     ):
         device = self._execution_device
         batch_size = 1
             grid_rows, grid_cols = len(prompt), len(prompt[0])
             tile_width = (width + (grid_cols - 1) * tile_overlap) // grid_cols
             tile_height = (height + (grid_rows - 1) * tile_overlap) // grid_rows
+            tile_width -= tile_width % PIXEL_MULTIPLE; tile_height -= tile_height % PIXEL_MULTIPLE
             final_width = tile_width * grid_cols - (grid_cols - 1) * tile_overlap
             final_height = tile_height * grid_rows - (grid_rows - 1) * tile_overlap
+            x_steps = [i * (tile_width - tile_overlap) for i in range(grid_cols)]
+            y_steps = [i * (tile_height - tile_overlap) for i in range(grid_rows)]
+            logger.info(f"Prompt grid provided. Using fixed {grid_rows}x{grid_cols} grid. Actual resolution: {final_width}x{final_height}.")
+        else: # Tiling Mode
             final_width, final_height = width, height
             tile_width, tile_height = _adaptive_tile_size((final_width, final_height), max_tile_size=max_tile_size)
+            tile_width -= tile_width % PIXEL_MULTIPLE; tile_height -= tile_height % PIXEL_MULTIPLE
             y_steps = _calculate_tile_positions(final_height, tile_height, tile_overlap)
             x_steps = _calculate_tile_positions(final_width, tile_width, tile_overlap)
             grid_rows, grid_cols = len(y_steps), len(x_steps)
         logger.info(f"Processing image in a {grid_rows}x{grid_cols} grid of tiles.")
         text_embeddings = []
         for r in range(grid_rows):
             row_embeddings = []
             for c in range(grid_cols):
                 p = prompt[r][c] if is_prompt_grid else prompt
                 prompt_embeds, pooled, text_ids = self.encode_prompt(p, device=device, max_sequence_length=max_sequence_length)
+                row_embeddings.append({"prompt_embeds": prompt_embeds, "pooled_prompt_embeds": pooled, "txt_ids": text_ids})
             text_embeddings.append(row_embeddings)
         prompt_dtype = text_embeddings[0][0]["prompt_embeds"].dtype
         latents = randn_tensor(latents_shape, generator=generator, device=device, dtype=prompt_dtype)
         image_seq_len = (tile_height // self.vae_scale_factor // 2) * (tile_width // self.vae_scale_factor // 2)
+        mu = calculate_shift(image_seq_len); timesteps, _ = retrieve_timesteps(self.scheduler, num_inference_steps, device, mu=mu)
+        tile_weights = self.prepare_tiles_weights(y_steps, x_steps, tile_height, tile_width, final_height, final_width, tile_weighting_method, tile_gaussian_sigma, batch_size, device, latents.dtype)
+        self.text_encoder.to("cpu")
+        self.text_encoder_2.to("cpu")
         release_memory(device)
         with self.progress_bar(total=num_inference_steps) as progress_bar:
             for i, t in enumerate(timesteps):
                 noise_preds_tiles = np.empty((grid_rows, grid_cols), dtype=object)
                 for r, y_start in enumerate(y_steps):
                     for c, x_start in enumerate(x_steps):
                         px_r_init, px_r_end, px_c_init, px_c_end = _tile2pixel_indices(y_start, x_start, tile_width, tile_height, final_width, final_height)
+                        current_tile_pixel_height = px_r_end - px_r_init; current_tile_pixel_width = px_c_end - px_c_init
                         r_init, r_end, c_init, c_end = _tile2latent_indices(px_r_init, px_r_end, px_c_init, px_c_end, self.vae_scale_factor)
                         tile_latents = latents[:, :, r_init:r_end, c_init:c_end]
                         timestep = t.expand(b).to(packed_latents.dtype)
                         current_gs_value = guidance_scale_tiles[r][c] if (is_prompt_grid and guidance_scale_tiles) else guidance_scale
+                        current_guidance = torch.tensor([current_gs_value], device=device) if self.transformer.config.guidance_embeds else None
+                        noise_pred_packed = self.transformer(
                             hidden_states=packed_latents, timestep=timestep / 1000, guidance=current_guidance,
                             pooled_projections=embeds["pooled_prompt_embeds"],
                             encoder_hidden_states=embeds["prompt_embeds"],
                             txt_ids=embeds["txt_ids"], img_ids=latent_image_ids,
                         )[0]
+                        noise_pred_tile = self._unpack_latents(noise_pred_packed, current_tile_pixel_height, current_tile_pixel_width, self.vae_scale_factor)
                         noise_preds_tiles[r, c] = noise_pred_tile
+                # Stitching and Scheduler step (no changes)
                 noise_pred = torch.zeros_like(latents)
                 contributors = torch.zeros_like(latents)
                 for r, y_start in enumerate(y_steps):
                 latents_dtype = latents.dtype
                 latents = self.scheduler.step(noise_pred, t, latents)[0]
+                if latents.dtype != latents_dtype: latents = latents.to(latents_dtype)
                 progress_bar.update()
         # Post-processing
+        if output_type == "latent": image = latents
         else:
             self.vae.to(device)
             latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor
             image = self.vae.decode(latents.to(self.vae.dtype))[0]
             image = self.image_processor.postprocess(image, output_type=output_type)
+        self.maybe_free_model_hooks();
         return FluxPipelineOutput(images=image)

infer.py CHANGED Viewed

@@ -1,36 +1,49 @@
 # infer.py
-# A command-line inference script to test the FluxUnifiedTilingPipeline.
 # This script runs the first example from the Gradio app to verify functionality
 # and observe the progress bar in the terminal.
 import torch
-from PIL import Image
 import time
-# Make sure flux_unified_tiling_pipeline.py is in the same directory
 from flux_pipeline_mod import FluxMoDTilingPipeline
 # Optional: for memory offloading
-try:
-    from mmgp import offload, profile_type
-except ImportError:
-    print("Warning: 'mmgp' library not found. Offload will not be applied.")
-    offload = None
 def main():
     """Main function to run the inference process."""
     # 1. Load Model
     print("--- 1. Loading Model ---")
     # !!! IMPORTANT: Make sure this path is correct for your system !!!
-    MODEL_PATH = "F:\\Models\\Flux_dev"
     start_load_time = time.time()
-    pipe = FluxMoDTilingPipeline.from_pretrained(
-        MODEL_PATH,
-        torch_dtype=torch.bfloat16
-    )
     # Apply memory optimization
     if offload:
         print("Applying LowRAM_LowVRAM offload profile...")
@@ -57,8 +70,7 @@ def main():
         "Captain America charging forward, vibranium shield deflecting energy blasts in destroyed cityscape, cinematic composition. Focus: Captain America.",
         "Thor wielding Stormbreaker in destroyed cityscape, lightning crackling, powerful strike downwards, cinematic photography. Focus: Thor."
     ]]
-    negative_prompt = "nsfw, lowres, bad anatomy, bad hands, duplicate, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, blurry"
     # Tiling and Dimensions
     target_height = 1024
     target_width = 3072
@@ -66,17 +78,14 @@ def main():
     tile_weighting_method = "Cosine"
     # Generation
-    num_inference_steps = 30
-    guidance_scale_tiles = [[5.0, 5.0, 5.0]]
     seed = 619517442
     # Create a generator for reproducibility
-    generator = torch.Generator("cuda" if torch.cuda.is_available() else "cpu").manual_seed(seed)
-    print("Parameters set:")
-    print(f"  Resolution: {target_width}x{target_height}")
-    print(f"  Steps: {num_inference_steps}")
-    print(f"  Seed: {seed}")
     # 3. Start Inference
     print("\n--- 3. Starting Inference ---")
@@ -86,12 +95,11 @@ def main():
     image = pipe(
         prompt=prompt_grid,
         height=target_height,
-        width=target_width,
-        negative_prompt=negative_prompt,
         tile_overlap=tile_overlap,
-        guidance_scale_tiles=guidance_scale_tiles,
-        tile_weighting_method=tile_weighting_method,
         generator=generator,
         num_inference_steps=num_inference_steps
     ).images[0]
@@ -100,7 +108,7 @@ def main():
     # 4. Save Output
     print("\n--- 4. Saving Output ---")
-    output_filename = "inference_output.png"
     image.save(output_filename)
     print(f"Image successfully saved as '{output_filename}'")

 # infer.py
+# A command-line inference script to test the FluxMoDTilingPipeline.
 # This script runs the first example from the Gradio app to verify functionality
 # and observe the progress bar in the terminal.
+import os
 import torch
 import time
+# Make sure flux_pipeline_mod.py is in the same directory
 from flux_pipeline_mod import FluxMoDTilingPipeline
+# Conditional MMGP Setup based on Environment Variable
+USE_MMGP_ENV = os.getenv('USE_MMGP', 'true').lower()
+USE_MMGP = USE_MMGP_ENV not in ('false', '0', 'no', 'none')
 # Optional: for memory offloading
+if USE_MMGP:
+    try:
+        from mmgp import offload, profile_type
+    except ImportError:
+        print("Warning: 'mmgp' library not found. Offload will not be applied.")
+        offload = None
+else:
+    print("INFO: MMGP is disabled.")
 def main():
     """Main function to run the inference process."""
     # 1. Load Model
     print("--- 1. Loading Model ---")
     # !!! IMPORTANT: Make sure this path is correct for your system !!!
+    #MODEL_PATH = "F:\\Models\\FLUX.1-schnell"
+    MODEL_PATH = "black-forest-labs/FLUX.1-schnell"
     start_load_time = time.time()
+    if USE_MMGP:
+        pipe = FluxMoDTilingPipeline.from_pretrained(
+            MODEL_PATH,
+            torch_dtype=torch.bfloat16
+        )
+    else:
+        pipe = FluxMoDTilingPipeline.from_pretrained(
+            MODEL_PATH,
+            torch_dtype=torch.bfloat16
+        ).to("cuda")
     # Apply memory optimization
     if offload:
         print("Applying LowRAM_LowVRAM offload profile...")
         "Captain America charging forward, vibranium shield deflecting energy blasts in destroyed cityscape, cinematic composition. Focus: Captain America.",
         "Thor wielding Stormbreaker in destroyed cityscape, lightning crackling, powerful strike downwards, cinematic photography. Focus: Thor."
     ]]
     # Tiling and Dimensions
     target_height = 1024
     target_width = 3072
     tile_weighting_method = "Cosine"
     # Generation
+    num_inference_steps = 4
+    guidance_scale = 0.0
     seed = 619517442
     # Create a generator for reproducibility
+    generator = torch.Generator("cuda").manual_seed(seed)
+    print(f"Resolution: {target_width}x{target_height}, Steps: {num_inference_steps}, Guidance: {guidance_scale}")
     # 3. Start Inference
     print("\n--- 3. Starting Inference ---")
     image = pipe(
         prompt=prompt_grid,
         height=target_height,
+        width=target_width,
         tile_overlap=tile_overlap,
+        guidance_scale=guidance_scale,
         generator=generator,
+        tile_weighting_method=tile_weighting_method,
         num_inference_steps=num_inference_steps
     ).images[0]
     # 4. Save Output
     print("\n--- 4. Saving Output ---")
+    output_filename = "outputs/inference_output.png"
     image.save(output_filename)
     print(f"Image successfully saved as '{output_filename}'")

requirements.txt CHANGED Viewed

@@ -11,4 +11,6 @@ hf_xet
 protobuf
 sentencepiece
 ligo-segments
-scipy

 protobuf
 sentencepiece
 ligo-segments
+scipy
+triton-windows<3.5; sys_platform == 'win32'
+triton==3.4.0; sys_platform != 'win32'

requirements_local.txt CHANGED Viewed

@@ -14,4 +14,6 @@ mmgp
 protobuf
 sentencepiece
 ligo-segments
-scipy

 protobuf
 sentencepiece
 ligo-segments
+scipy
+triton-windows<3.5; sys_platform == 'win32'
+triton==3.4.0; sys_platform != 'win32'