v1.1
版本发布时间: 2024-10-02 04:51:39
bghira/SimpleTuner最新发布版本:v1.1.1(2024-10-05 08:37:33)
Features
Performance
- Improved launch speed for large datasets (>1M samples)
- Improved speed for quantising on CPU
- Optional support for directly quantising on GPU near-instantly (
--quantize_via
)
Compatibility
- SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
- Updated documentation to make multiGPU configuration a bit more obvious.
- Improved support for
torch.compile()
, including automatically disabling it when eg.fp8-quanto
is enabled- Enable via
accelerate config
orconfig/config.env
viaTRAINER_DYNAMO_BACKEND=inductor
- Enable via
- TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (
int8-torchao
) -
f8uz-quanto
, a compatibility level for AMD users to experiment with FP8 training dynamics - Support for multigpu PEFT LoRA training with Quanto enabled (not
fp8-quanto
)- Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
- Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.
Integrations
- Images now get logged to tensorboard (thanks @anhi)
- FastAPI endpoints for integrations (undocumented)
- "raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service
Optims
-
SOAP optimiser support
- uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
- New 8bit and 4bit optimiser options from TorchAO (
ao-adamw8bit
,ao-adamw4bit
etc)
Pull Requests
- Fix flux cfg sampling bug by @AmericanPresidentJimmyCarter in https://github.com/bghira/SimpleTuner/pull/981
- merge by @bghira in https://github.com/bghira/SimpleTuner/pull/982
- FastAPI endpoints for managing trainer as a service by @bghira in https://github.com/bghira/SimpleTuner/pull/969
- constant lr resume fix for optimi-stableadamw by @bghira in https://github.com/bghira/SimpleTuner/pull/984
- clear data backends before configuring new ones by @bghira in https://github.com/bghira/SimpleTuner/pull/992
- update to latest quanto main by @bghira in https://github.com/bghira/SimpleTuner/pull/994
- log images in tensorboard by @anhi in https://github.com/bghira/SimpleTuner/pull/998
- merge by @bghira in https://github.com/bghira/SimpleTuner/pull/999
- torchao: add int8; quanto: add NF4; torch compile fixes + ability to compile optim by @bghira in https://github.com/bghira/SimpleTuner/pull/986
- update flux quickstart by @bghira in https://github.com/bghira/SimpleTuner/pull/1000
- compile optimiser by @bghira in https://github.com/bghira/SimpleTuner/pull/1001
- optimizer compile step only by @bghira in https://github.com/bghira/SimpleTuner/pull/1002
- remove optimiser compilation arg by @bghira in https://github.com/bghira/SimpleTuner/pull/1003
- remove optim compiler from options by @bghira in https://github.com/bghira/SimpleTuner/pull/1004
- remove optim compiler from options by @bghira in https://github.com/bghira/SimpleTuner/pull/1005
- SOAP optimiser; int4 fixes for 4090 by @bghira in https://github.com/bghira/SimpleTuner/pull/1006
- torchao: install 0.5.0 from pytorch source by @bghira in https://github.com/bghira/SimpleTuner/pull/1007
- update safety check warning with guidance toward cache clear interval for OOM issues by @bghira in https://github.com/bghira/SimpleTuner/pull/1008
- fix webhook contents for discord by @bghira in https://github.com/bghira/SimpleTuner/pull/1011
- fp8-quanto fixes, unblocking of PEFT multigpu LoRA training for other precision levels by @bghira in https://github.com/bghira/SimpleTuner/pull/1013
- quanto: activations sledgehammer by @bghira in https://github.com/bghira/SimpleTuner/pull/1014
- 1.1 merge window by @bghira in https://github.com/bghira/SimpleTuner/pull/1010
Full Changelog: https://github.com/bghira/SimpleTuner/compare/v1.0.1...v1.1