v0.24.0
版本发布时间: 2024-08-26 22:48:28
mosaicml/composer最新发布版本:v0.25.0(2024-09-25 04:56:05)
What's New
1. Torch 2.4 Compatibility (#3542, #3549, #3553, #3552, #3565)
Composer now supports Torch 2.4! We are tracking a few issues with the latest PyTorch we have raised with the PyTorch team related to checkpointing:
- [PyTorch Issue] Distributed checkpointing using PyTorch DCP has issues with stateless optimizers, e.g. SGD. We recommend using
composer.optim.DecoupledSGDW
as a workaround. - [PyTorch Issue] Distributed checkpointing using PyTorch DCP broke backwards compatibility. We have patched this using the following planner, but this may break custom planner loading.
2. New checkpointing APIs (#3447, #3474, #3488, #3452)
We've added new checkpointing APIs to download, upload, and load / save, so that checkpointing is usable outside of a Trainer
object. We will be fully migrating to these new APIs in the next minor release.
3: Improved Auto-microbatching (#3510, #3522)
We've fixed deadlocks with auto-microbatching with FSDP, bringing throughput in line with manually setting the microbatch size. This is achieved through enabling sync hooks wherever a training run might OOM to find the correct microbatch size, and disabling these hooks for the rest of training.
Bug Fixes
1. Fix checkpoint symlink uploads (#3376)
Ensures that checkpoint files are uploaded before the symlink file, fixing errors with missing or incomplete checkpoints.
2. Optimizer tracks same parameters after FSDP wrapping (#3502)
When only a subset of parameters should be tracked by the optimizer, FSDP wrapping will now not interfere.
What's Changed
- Bump ipykernel from 6.29.2 to 6.29.5 by @dependabot in https://github.com/mosaicml/composer/pull/3459
- Update torchmetrics requirement from <1.3.3,>=0.10.0 to >=1.4.0.post0,<1.4.1 by @dependabot in https://github.com/mosaicml/composer/pull/3460
- [Checkpoint] Fix symlink issue where symlink file uploaded before checkpoint files upload by @bigning in https://github.com/mosaicml/composer/pull/3376
- Bump databricks-sdk from 0.28.0 to 0.29.0 by @dependabot in https://github.com/mosaicml/composer/pull/3456
- Remove Log Exception by @jjanezhang in https://github.com/mosaicml/composer/pull/3464
- Corrected docs for MFU in SpeedMonitor by @JackZ-db in https://github.com/mosaicml/composer/pull/3469
- [checkpoint v2] Download api by @bigning in https://github.com/mosaicml/composer/pull/3447
- Upload api by @bigning in https://github.com/mosaicml/composer/pull/3474
- [Checkpoint V2] Upload API by @bigning in https://github.com/mosaicml/composer/pull/3488
- Load api by @eracah in https://github.com/mosaicml/composer/pull/3452
- Add helpful comment explaining HSDP initialization seeding by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3470
- Add fit start to mosaicmllogger by @ethanma-db in https://github.com/mosaicml/composer/pull/3467
- Remove OOM-Driven FSDP Deadlocks and Increase Throughput of Automicrobatching by @JackZ-db in https://github.com/mosaicml/composer/pull/3510
- Move hooks and fsdp modules onto state rather than trainer by @JackZ-db in https://github.com/mosaicml/composer/pull/3522
- Bump coverage[toml] from 7.5.4 to 7.6.0 by @dependabot in https://github.com/mosaicml/composer/pull/3471
- revert a wip PR by @bigning in https://github.com/mosaicml/composer/pull/3475
- Change FP8 Eval to default to activation dtype by @j316chuck in https://github.com/mosaicml/composer/pull/3454
- Get a shared file system safe signal file name by @dakinggg in https://github.com/mosaicml/composer/pull/3485
- Bumping flash attention version to v2.6.2 by @ShashankMosaicML in https://github.com/mosaicml/composer/pull/3489
- Bump to Pytorch 2.4 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3542
- Add Torch 2.4 Tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3549
- Fix torch 2.4 images for tests by @snarayan21 in https://github.com/mosaicml/composer/pull/3553
- Fix torch 2.4 tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3552
- Fix bug when subset of model parameters is passed into optimizer with FSDP by @sashaDoubov in https://github.com/mosaicml/composer/pull/3502
- Correctly process
parallelism_config['tp']
when it's a dict by @snarayan21 in https://github.com/mosaicml/composer/pull/3434 - [torch2.4] Fix sharded checkpointing backward compatibility issue by @bigning in https://github.com/mosaicml/composer/pull/3565
- [fix-daily] Use composer get_model_state_dict instead of torch's by @eracah in https://github.com/mosaicml/composer/pull/3492
- Load Microbatches instead of Entire Batches to GPU by @JackZ-db in https://github.com/mosaicml/composer/pull/3487
- Make Pytest log in color in Github Action by @eitanturok in https://github.com/mosaicml/composer/pull/3505
- Revert "Load Microbatches instead of Entire Batches to GPU " by @JackZ-db in https://github.com/mosaicml/composer/pull/3508
- Bump transformers version by @dakinggg in https://github.com/mosaicml/composer/pull/3511
- Fix FSDP Config Validation by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3530
- Add FSDP input validation for use_orig_params and activation_cpu_offload flag by @j316chuck in https://github.com/mosaicml/composer/pull/3515
- Fix checkpoint events by @b-chu in https://github.com/mosaicml/composer/pull/3468
- Patch conf.py for readthedocs sphinx injection deprecation. by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3491
- save load path in state and pass to mosaicmllogger by @ethanma-db in https://github.com/mosaicml/composer/pull/3506
- Disable gcs azure daily test by @bigning in https://github.com/mosaicml/composer/pull/3514
- Update huggingface-hub requirement from <0.24,>=0.21.2 to >=0.21.2,<0.25 by @dependabot in https://github.com/mosaicml/composer/pull/3481
- restore version on dev by @XiaohanZhangCMU in https://github.com/mosaicml/composer/pull/3451
- Deprecate deepspeed by @dakinggg in https://github.com/mosaicml/composer/pull/3512
- Update importlib-metadata requirement from <7,>=5.0.0 to >=5.0.0,<9 by @dependabot in https://github.com/mosaicml/composer/pull/3519
- Update peft requirement from <0.12,>=0.10.0 to >=0.10.0,<0.13 by @dependabot in https://github.com/mosaicml/composer/pull/3518
- Use gloo as part of DeviceGPU's process group backend by @snarayan21 in https://github.com/mosaicml/composer/pull/3509
- Add a monitor of mlflow logger so that it sets run status as failed if main thread exits unexpectedly by @chenmoneygithub in https://github.com/mosaicml/composer/pull/3449
- Revert "Use gloo as part of DeviceGPU's process group backend (#3509)" by @snarayan21 in https://github.com/mosaicml/composer/pull/3523
- Fix autoresume docstring (save_overwrite) by @eracah in https://github.com/mosaicml/composer/pull/3526
- Unpin pip by @dakinggg in https://github.com/mosaicml/composer/pull/3524
- hasattr check for Wandb 0.17.6 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3531
- Remove dev on github workflows by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3536
- Remove dev branch in GPU workflows by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3539
- restore google cloud object store test by @bigning in https://github.com/mosaicml/composer/pull/3538
- Update moto[s3] requirement from <5,>=4.0.1 to >=4.0.1,<6 by @dependabot in https://github.com/mosaicml/composer/pull/3516
- use s3 boto3 Adaptive retry as default retry mode by @bigning in https://github.com/mosaicml/composer/pull/3543
- Use python 3.11 in GAs by @eitanturok in https://github.com/mosaicml/composer/pull/3529
- Implement ruff rules enforcing pep 585 by @snarayan21 in https://github.com/mosaicml/composer/pull/3551
- Update numpy requirement from <2.1.0,>=1.21.5 to >=1.21.5,<2.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/3556
- Bump databricks-sdk from 0.29.0 to 0.30.0 by @dependabot in https://github.com/mosaicml/composer/pull/3559
- Update Optim to DecoupledSGD in Notebooks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3554
- Remove lambda code eval testing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3560
- Restore Azure Tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3561
- Remove tokens for
to_next_epoch
by @mvpatel2000 in https://github.com/mosaicml/composer/pull/3562 - Change iteration timestamp for old checkpoints by @b-chu in https://github.com/mosaicml/composer/pull/3563
- Fix typo in
composer_collect_env
by @dakinggg in https://github.com/mosaicml/composer/pull/3566 - Add default value to get_device() by @coryMosaicML in https://github.com/mosaicml/composer/pull/3568
- add ghcr and update build matrix generator by @KevDevSha in https://github.com/mosaicml/composer/pull/3465
- Bump aws_ofi_nccl to 1.11.0 by @willgleich in https://github.com/mosaicml/composer/pull/3569
- allow listed runners by @KevDevSha in https://github.com/mosaicml/composer/pull/3486
- fix runner linux-ubuntu > ubuntu-latest by @KevDevSha in https://github.com/mosaicml/composer/pull/3571
- Bump version to v0.24.0 + deprecations by @snarayan21 in https://github.com/mosaicml/composer/pull/3570
New Contributors
- @ethanma-db made their first contribution in https://github.com/mosaicml/composer/pull/3467
- @KevDevSha made their first contribution in https://github.com/mosaicml/composer/pull/3465
Full Changelog: https://github.com/mosaicml/composer/compare/v0.23.5...v0.24.0