v0.31.0
版本发布时间: 2024-10-22 22:15:27
huggingface/diffusers最新发布版本:v0.31.0(2024-10-22 22:15:27)
v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more
Stable Diffusion 3.5 Large
Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):
- A regular one
- A timestep-distilled one enabling few-step inference
Make sure to fill up the form by going to the model page, and then run huggingface-cli login
before running the code below.
# make sure to update diffusers
# pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")
image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=40,
height=1024,
width=1024,
guidance_scale=4.5,
).images[0]
image.save("sd3_hello_world.png")
Follow the documentation to know more.
Cogview3-plus
We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!
from diffusers import CogView3PlusPipeline
import torch
pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")
# Enable it to reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
prompt=prompt,
guidance_scale=7.0,
num_images_per_prompt=1,
num_inference_steps=50,
width=1024,
height=1024,
).images[0]
image.save("cogview3.png")
Refer to the documentation to know more.
Quantization
We have landed native quantization support in Diffusers, starting with bitsandbytes
as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.
The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:
pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
pip install -Uq diffusers
from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
import torch
ckpt_id = "black-forest-labs/FLUX.1-dev"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = FluxTransformer2DModel.from_pretrained(
ckpt_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
Then, we use model_nf4
to instantiate the FluxPipeline
:
from diffusers import FluxPipeline
pipeline = StableDiffusion3Pipeline.from_pretrained(
ckpt_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
prompt=prompt,
negative_prompt="",
num_inference_steps=50,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
image.save("whimsical.png")
Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.
Training scripts
We have a fresh bucket of training scripts with this release:
Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.
Misc
- We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
- Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!
All commits
- Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
- Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
- set max_shard_size to None for pipeline save_pretrained by @a-r-r-o-w in #9447
- adapt masked im2im pipeline for SDXL by @noskill in #7790
- [Flux] add lora integration tests. by @sayakpaul in #9353
- [training] CogVideoX Lora by @a-r-r-o-w in #9302
- Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
- [refactor] LoRA tests by @a-r-r-o-w in #9481
- [CI] fix nightly model tests by @sayakpaul in #9483
- [Cog] some minor fixes and nits by @sayakpaul in #9466
- [Tests] Reduce the model size in the lumina test by @saqlain2204 in #8985
- Fix the bug of sd3 controlnet training when using gradient checkpointing. by @pibbo88 in #9498
- [Schedulers] Add exponential sigmas / exponential noise schedule by @hlky in #9499
- Allow DDPMPipeline half precision by @sbinnee in #9222
- Add Noise Schedule/Schedule Type to Schedulers Overview documentation by @hlky in #9504
- fix bugs for sd3 controlnet training by @xduzhangjiayu in #9489
- [Doc] Fix path and and also import imageio by @LukeLIN-web in #9506
- [CI] allow faster downloads from the Hub in CI. by @sayakpaul in #9478
- a few fix for SingleFile tests by @yiyixuxu in #9522
- Add exponential sigmas to other schedulers and update docs by @hlky in #9518
- [Community Pipeline] Batched implementation of Flux with CFG by @sayakpaul in #9513
- Update community_projects.md by @lee101 in #9266
- [docs] Model sharding by @stevhliu in #9521
- update get_parameter_dtype by @yiyixuxu in #9526
- [Doc] Improved level of clarity for latents_to_rgb. by @LagPixelLOL in #9529
- [Schedulers] Add beta sigmas / beta noise schedule by @hlky in #9509
- flux controlnet fix (control_modes batch & others) by @yiyixuxu in #9507
- [Tests] Fix ChatGLMTokenizer by @asomoza in #9536
- [bug] Precedence of operations in VAE should be slicing -> tiling by @a-r-r-o-w in #9342
- [LoRA] make set_adapters() method more robust. by @sayakpaul in #9535
- [examples] add train flux-controlnet scripts in example. by @PromeAIpro in #9324
- [Tests] [LoRA] clean up the serialization stuff. by @sayakpaul in #9512
- [Core] fix variant-identification. by @sayakpaul in #9253
- [refactor] remove conv_cache from CogVideoX VAE by @a-r-r-o-w in #9524
- [train_instruct_pix2pix.py]Fix the LR schedulers when
num_train_epochs
is passed in a distributed training env by @AnandK27 in #9316 - [chore] fix: retain memory utility. by @sayakpaul in #9543
- [LoRA] support Kohya Flux LoRAs that have text encoders as well by @sayakpaul in #9542
- Add beta sigmas to other schedulers and update docs by @hlky in #9538
- Add PAG support to StableDiffusionControlNetPAGInpaintPipeline by @juancopi81 in #8875
- Support bfloat16 for Upsample2D by @darhsu in #9480
- fix cogvideox autoencoder decode by @Xiang-cd in #9569
- [sd3] make sure height and size are divisible by
16
by @yiyixuxu in #9573 - fix xlabs FLUX lora conversion typo by @Clement-Lelievre in #9581
- [Chore] add a note on the versions in Flux LoRA integration tests by @sayakpaul in #9598
- fix vae dtype when accelerate config using --mixed_precision="fp16" by @xduzhangjiayu in #9601
- refac: docstrings in import_utils.py by @yijun-lee in #9583
- Fix for use_safetensors parameters, allow use of parameter on loading submodels by @elismasilva in #9576)
- Update distributed_inference.md to include
transformer.device_map
by @sayakpaul in #9553 - fix: CogVideox train dataset _preprocess_data crop video by @glide-the in #9574
- [LoRA] Handle DoRA better by @sayakpaul in #9547
- Fixed noise_pred_text referenced before assignment. by @LagPixelLOL in #9537
- Fix the bug that
joint_attention_kwargs
is not passed to the FLUX's transformer attention processors by @HorizonWind2004 in #9517 - refac/pipeline_output by @yijun-lee in #9582
- [LoRA] allow loras to be loaded with low_cpu_mem_usage. by @sayakpaul in #9510
- add PAG support for SD Img2Img by @SahilCarterr in #9463
- make controlnet support interrupt by @pureexe in #9620
- [LoRA] fix dora test to catch the warning properly. by @sayakpaul in #9627
- flux controlnet control_guidance_start and control_guidance_end implement by @ighoshsubho in #9571
- fix IsADirectoryError when running the training code for sd3_dreambooth_lora_16gb.ipynb by @alaister123 in #9634
- Add Differential Diffusion to Kolors by @saqlain2204 in #9423
- FluxMultiControlNetModel by @hlky in #9647
- [CI] replace ubuntu version to 22.04. by @sayakpaul in #9656
- [docs] Fix xDiT doc image damage by @Eigensystem in #9655
- [Tests] increase transformers version in
test_low_cpu_mem_usage_with_loading
by @sayakpaul in #9662 - Flux - soft inpainting via differential diffusion by @ryanlyn in #9268
- CogView3Plus DiT by @zRzRzRzRzRzRzR in #9570
- Improve the performance and suitable for NPU computing by @leisuzz in #9642
- [
Community Pipeline
] Add 🪆Matryoshka Diffusion Models by @tolgacangoz in #9157 - Added Lora Support to SD3 Img2Img Pipeline by @SahilCarterr in #9659
- Add pred_original_sample to
if not return_dict
path by @hlky in #9649 - Convert list/tuple of
SD3ControlNetModel
toSD3MultiControlNetModel
by @hlky in #9652 - Convert list/tuple of
HunyuanDiT2DControlNetModel
toHunyuanDiT2DMultiControlNetModel
by @hlky in #9651 - Refactor SchedulerOutput and add pred_original_sample in
DPMSolverSDE
,Heun
,KDPM2Ancestral
andKDPM2
by @hlky in #9650 - Slight performance improvement to
Euler
,EDMEuler
,FlowMatchHeun
,KDPM2Ancestral
by @hlky in #9616 - [Fix] when run load pretain with local_files_only, local variable 'cached_folder' referenced before assignment by @RobinXL in #9376
- [Chore] fix import of EntryNotFoundError. by @sayakpaul in #9676
- Dreambooth lora flux bug 3dtensor to 2dtensor by @0x-74 in #9653
- refactor image_processor.py file by @charchit7 in #9608
- [doc] Fix some docstrings in
src/diffusers/training_utils.py
by @mreraser in #9606 - [docs] refactoring docstrings in
community/hd_painter.py
by @Jwaminju in #9593 - [docs] refactoring docstrings in
models/embeddings_flax.py
by @Jwaminju in #9592 - Fix some documentation in ./src/diffusers/models/adapter.py by @ahnjj in #9591
- [training] CogVideoX-I2V LoRA by @a-r-r-o-w in #9482
- [authored by @Anghellia) Add support of Xlabs Controlnets #9638 by @yiyixuxu in #9687
- Docs: CogVideoX by @glide-the in #9578
- Resolves [BUG] 'GatheredParameters' object is not callable by @charchit7 in #9614
- [LoRA] log a warning when there are missing keys in the LoRA loading. by @sayakpaul in #9622
- [SD3 dreambooth-lora training] small updates + bug fixes by @linoytsaban in #9682
- [peft] simple update when unscale by @sweetcocoa in #9689
- [pipeline] CogVideoX-Fun Control by @a-r-r-o-w in #9671
- [core] improve VAE encode/decode framewise batching by @a-r-r-o-w in #9684
- [tests] fix name and unskip CogI2V integration test by @a-r-r-o-w in #9683
- [Flux] Add advanced training script + support textual inversion inference by @linoytsaban in #9434
- [refactor] DiffusionPipeline.download by @a-r-r-o-w in #9557
- [advanced flux lora script] minor updates to readme by @linoytsaban in #9705
- Fix bug in Textual Inversion Unloading by @bonlime in #9304
- Add prompt scheduling callback to community scripts by @hlky in #9718
- [CI] pin max torch version to fix CI errors by @a-r-r-o-w in #9709
- [Docker] pin torch versions in the dockerfiles. by @sayakpaul in #9721
-
make deps_table_update
to fix CI tests by @a-r-r-o-w in #9720 - [Quantization] Add quantization support for
bitsandbytes
by @sayakpaul in #9213 - Fix typo in cogvideo pipeline by @lichenyu20 in #9722
- [Docs] docs to xlabs controlnets. by @sayakpaul in #9688
- [docs] add docstrings in
pipline_stable_diffusion.py
by @jeongiin in #9590 - minor doc/test update by @yiyixuxu in #9734
- [bugfix] reduce float value error when adding noise by @gameofdimension in #9004
- fix singlestep dpm tests by @yiyixuxu in #9716
- Fix
schedule_shifted_power
usage in 🪆Matryoshka Diffusion Models by @tolgacangoz in #9723 - Update sd3 controlnet example by @DavyMorgan in #9735
- [Fix] Using sharded checkpoints with gated repositories by @asomoza in #9737
- [bitsandbbytes] follow-ups by @sayakpaul in #9730
- Fix typos by @DN6 in #9739
- is_safetensors_compatible fix by @DN6 in #9741
- Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ighoshsubho
- Feature flux controlnet img2img and inpaint pipeline (#9408)
- flux controlnet control_guidance_start and control_guidance_end implement (#9571)
- @noskill
- adapt masked im2im pipeline for SDXL (#7790)
- @saqlain2204
- [Tests] Reduce the model size in the lumina test (#8985)
- Add Differential Diffusion to Kolors (#9423)
- @hlky
- [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
- Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
- Add exponential sigmas to other schedulers and update docs (#9518)
- [Schedulers] Add beta sigmas / beta noise schedule (#9509)
- Add beta sigmas to other schedulers and update docs (#9538)
- FluxMultiControlNetModel (#9647)
- Add pred_original_sample to
if not return_dict
path (#9649) - Convert list/tuple of
SD3ControlNetModel
toSD3MultiControlNetModel
(#9652) - Convert list/tuple of
HunyuanDiT2DControlNetModel
toHunyuanDiT2DMultiControlNetModel
(#9651) - Refactor SchedulerOutput and add pred_original_sample in
DPMSolverSDE
,Heun
,KDPM2Ancestral
andKDPM2
(#9650) - Slight performance improvement to
Euler
,EDMEuler
,FlowMatchHeun
,KDPM2Ancestral
(#9616) - Add prompt scheduling callback to community scripts (#9718)
- @yiyixuxu
- a few fix for SingleFile tests (#9522)
- update get_parameter_dtype (#9526)
- flux controlnet fix (control_modes batch & others) (#9507)
- [sd3] make sure height and size are divisible by
16
(#9573) - [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
- minor doc/test update (#9734)
- fix singlestep dpm tests (#9716)
- @PromeAIpro
- [examples] add train flux-controlnet scripts in example. (#9324)
- @juancopi81
- Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
- @glide-the
- fix: CogVideox train dataset _preprocess_data crop video (#9574)
- Docs: CogVideoX (#9578)
- @SahilCarterr
- add PAG support for SD Img2Img (#9463)
- Added Lora Support to SD3 Img2Img Pipeline (#9659)
- @ryanlyn
- Flux - soft inpainting via differential diffusion (#9268)
- @zRzRzRzRzRzRzR
- CogView3Plus DiT (#9570)
- @tolgacangoz
- [
Community Pipeline
] Add 🪆Matryoshka Diffusion Models (#9157) - Fix
schedule_shifted_power
usage in 🪆Matryoshka Diffusion Models (#9723)
- [
- @linoytsaban
- [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
- [Flux] Add advanced training script + support textual inversion inference (#9434)
- [advanced flux lora script] minor updates to readme (#9705)