v0.14.3
版本发布时间: 2024-06-13 02:14:16
microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)
What's Changed
- Update version.txt after 0.14.2 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5458
- Add getter and setter methods for compile_backend across accelerators. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5299
- Fix torch.compile error for PyTorch v2.3 by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5463
- Revert "stage3: efficient compute of scaled_global_grad_norm (#5256)" by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5461
- Update ds-chat CI workflow paths to include zero stage 1-3 files by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5462
- Update with ops not supported on Windows by @loadams in https://github.com/microsoft/DeepSpeed/pull/5468
- fix: swapping order of parameters in create_dir_symlink method. by @alvieirajr in https://github.com/microsoft/DeepSpeed/pull/5465
- Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5459
- re-introduce: stage3: efficient compute of scaled_global_grad_norm by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5493
- Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() by @harygo2 in https://github.com/microsoft/DeepSpeed/pull/5464
- Fix compile wrapper by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5455
- enable phi3_mini autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5501
- Fused adam for HPU by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5500
- [manifest] update mainfest to add hpp file in csrc. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5522
- enable phi2 autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5436
- Switch pynvml to nvidia-ml-py by @loadams in https://github.com/microsoft/DeepSpeed/pull/5529
- Switch from double quotes to match single quotes by @loadams in https://github.com/microsoft/DeepSpeed/pull/5530
- [manifest] update mainfest to add hpp file in deepspeed. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5533
- New integration - CometMonitor by @alexkuzmik in https://github.com/microsoft/DeepSpeed/pull/5466
- Improve _configure_optimizer() final optimizer log by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5528
- Enhance testing: Skip fused_optimizer tests if not supported. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5159
- Skip the UT cases that use unimplemented op builders. by @foin6 in https://github.com/microsoft/DeepSpeed/pull/5372
- rocblas -> hipblas changes for ROCm by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5401
- Rocm warp size fix by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5402
- CPUAdam fp16 and bf16 support by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5409
- Optimize zero3 fetch params using all_reduce by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5420
- Fix the TypeError for XPU Accelerator by @shiyang-weng in https://github.com/microsoft/DeepSpeed/pull/5531
- Fix RuntimeError for moe on XPU: tensors found at least two devices by @shiyang-weng in https://github.com/microsoft/DeepSpeed/pull/5519
- Remove synchronize calls from allgather params by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5516
- Avoid overwrite of compiled module wrapper attributes by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5549
- Small typos in functions set_none_gradients_to_zero by @TravelLeraLone in https://github.com/microsoft/DeepSpeed/pull/5557
- Adapt doc for #4405 by @oraluben in https://github.com/microsoft/DeepSpeed/pull/5552
- Update to HF_HOME from TRANSFORMERS_CACHE by @loadams in https://github.com/microsoft/DeepSpeed/pull/4816
- [INF] DSAttention allow input_mask to have false as value by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5546
- Add throughput timer configuration by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5363
- Add Ulysses DistributedAttention compatibility by @Kwen-Chen in https://github.com/microsoft/DeepSpeed/pull/5525
- Add hybrid_engine.py as path to trigger the DS-Chat GH workflow by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5562
- Update HPU docker version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5566
- Rename files in fp_quantize op from quantize.* to fp_quantize.* by @loadams in https://github.com/microsoft/DeepSpeed/pull/5577
- [MiCS] Remove the handle print on DeepSpeed side by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5574
- Update to fix sidebar over text by @loadams in https://github.com/microsoft/DeepSpeed/pull/5567
- DeepSpeedCheckpoint: support custom final ln idx by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5506
- Update minor CUDA version compatibility by @adk9 in https://github.com/microsoft/DeepSpeed/pull/5591
- Add slide deck for meetup in Japan by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5598
- Fixed the Windows build. by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5596
- estimate_zero2_model_states_mem_needs: fixing memory estiamtion by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5099
- Fix cuda hardcode for inference woq by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5565
- fix sequence parallel(Ulysses) grad scale for zero0 by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5555
- Add Compressedbackend for Onebit optimizers by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5473
- Updated hpu-gaudi2 tests content. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5622
- Pin transformers version for MII tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5629
- WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo by @NirSonnenschein in https://github.com/microsoft/DeepSpeed/pull/5590
- stage_1_and_2: optimize clip calculation to use clamp by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5632
- Fix overlap communication of ZeRO stage 1 and 2 by @penn513 in https://github.com/microsoft/DeepSpeed/pull/5606
- fixes in _partition_param_sec function by @mmhab in https://github.com/microsoft/DeepSpeed/pull/5613
- assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect by @polisettyvarma in https://github.com/microsoft/DeepSpeed/pull/5569
- pipe/_exec_backward_pass: fix immediate grad update by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5605
- Monitor was always enabled causing performance degradation by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5633
New Contributors
- @alvieirajr made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5465
- @harygo2 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5464
- @alexkuzmik made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5466
- @foin6 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5372
- @shiyang-weng made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5531
- @TravelLeraLone made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5557
- @oraluben made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5552
- @Kwen-Chen made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5525
- @adk9 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5591
- @costin-eseanu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5596
- @NirSonnenschein made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5590
- @penn513 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5606
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.2...v0.14.3