Update README.md to include a quantization code snippet
#7
by
sayakpaul
HF Staff
- opened
Cc: @YiYiXu @marcsun13
sayakpaul
changed pull request title from
Update README.md
to Update README.md to include a quantization code snippet
I tested this PR with the latest diffusers
, bitsandbytes
, and transformers
from git and I think there might be some issues.
- Steps is
4
in the quantized example which I believe is wrong. The non-quantized example uses50
steps and using4
steps with this code just results in a blur. - This model seems extremely sensitive to NF4 quantization, and running this code (even modified for
50
steps) results in a very grainy image. - The PR'd example removes the
Ultra HD, 4K, cinematic composition.
suffix which seems to be needed to get decent (if grainy) results.
EDIT: for anyone looking to run this quantized, Optimum Quanto (code taken from example here https://github.com/QwenLM/Qwen-Image/pull/6/files) seems to work much better than bitsandbytes
NF4.
Thanks for testing. Quanto takes a bit of time. Our team looked into it a bit and here is a working snippet:
Code
# make sure bitsandbytes is installed: `pip install -U bitsandbytes
from diffusers import DiffusionPipeline, PipelineQuantizationConfig
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
import torch
quant_config = PipelineQuantizationConfig(
quant_mapping={
"transformer": DiffusersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
llm_int8_skip_modules=[
"time_text_embed",
"img_in",
"norm_out",
"proj_out",
"img_mod",
"txt_mod",
],
),
"text_encoder": TransformersBitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
),
}
)
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image",
quantization_config=quant_config,
torch_dtype=torch.bfloat16
).to("cuda")
prompt = "A cat holding a sign that says hello world, Ultra HD, 4K, cinematic composition"
image = pipe(prompt, num_inference_steps=50).images[0]
image.save("qwenimage_nf4.png")
Here is the pre-quantized checkpoint: https://huggingface.co/diffusers/qwen-image-nf4/
Thanks @mdouglas for the investigation.
Can confirm that works, thanks! Note that there's a missing comma at "time_text_embed"
.
Text alignment improved when the text encoder was not quantized.
Code
from diffusers import DiffusionPipeline, PipelineQuantizationConfig
from diffusers import BitsAndBytesConfig
import torch
quant_config = PipelineQuantizationConfig(
quant_mapping={
"transformer": BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
llm_int8_skip_modules=[
"time_text_embed",
"img_in",
"norm_out",
"proj_out",
"img_mod",
"txt_mod",
],
),
}
)
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image",
quantization_config=quant_config,
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
prompt = "Japanese anime style, a cat holding a sign that says hello world"
negative_prompt="3d, cg, photo"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
).images[0]
image.save("qwenimage_nf4.png")
sayakpaul
changed pull request status to
closed