Update README.md to include a quantization code snippet

#7
by sayakpaul HF Staff - opened
sayakpaul changed pull request title from Update README.md to Update README.md to include a quantization code snippet

I tested this PR with the latest diffusers, bitsandbytes, and transformers from git and I think there might be some issues.

  1. Steps is 4 in the quantized example which I believe is wrong. The non-quantized example uses 50 steps and using 4 steps with this code just results in a blur.
  2. This model seems extremely sensitive to NF4 quantization, and running this code (even modified for 50 steps) results in a very grainy image.
  3. The PR'd example removes the Ultra HD, 4K, cinematic composition. suffix which seems to be needed to get decent (if grainy) results.

EDIT: for anyone looking to run this quantized, Optimum Quanto (code taken from example here https://github.com/QwenLM/Qwen-Image/pull/6/files) seems to work much better than bitsandbytes NF4.

Thanks for testing. Quanto takes a bit of time. Our team looked into it a bit and here is a working snippet:

Code
# make sure bitsandbytes is installed: `pip install -U bitsandbytes

from diffusers import DiffusionPipeline, PipelineQuantizationConfig
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
import torch

quant_config = PipelineQuantizationConfig(
    quant_mapping={
        "transformer": DiffusersBitsAndBytesConfig(
            load_in_4bit=True, 
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_quant_type="nf4",
            llm_int8_skip_modules=[
                "time_text_embed",
                "img_in",
                "norm_out",
                "proj_out",
                "img_mod",
                "txt_mod",
            ],
        ),
        "text_encoder": TransformersBitsAndBytesConfig(
            load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
        ),
    }
)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A cat holding a sign that says hello world, Ultra HD, 4K, cinematic composition"
image = pipe(prompt, num_inference_steps=50).images[0]
image.save("qwenimage_nf4.png")
Results

image.png

Here is the pre-quantized checkpoint: https://huggingface.co/diffusers/qwen-image-nf4/

Thanks @mdouglas for the investigation.

Can confirm that works, thanks! Note that there's a missing comma at "time_text_embed".

Text alignment improved when the text encoder was not quantized.

Code
from diffusers import DiffusionPipeline, PipelineQuantizationConfig
from diffusers import BitsAndBytesConfig
import torch

quant_config = PipelineQuantizationConfig(
    quant_mapping={
        "transformer": BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_quant_type="nf4",
            llm_int8_skip_modules=[
                "time_text_embed",
                "img_in",
                "norm_out",
                "proj_out",
                "img_mod",
                "txt_mod",
            ],
        ),
    }
)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

prompt = "Japanese anime style, a cat holding a sign that says hello world"
negative_prompt="3d, cg, photo"
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
).images[0]
image.save("qwenimage_nf4.png")
Result

image.png

Indeed. @OzzyGT from our team has a nice snippet that works across prompts. So, I will close mine in favor of his PR. Thanks for the discussions, folks!

sayakpaul changed pull request status to closed

Sign up or log in to comment