v0.14.3

microsoft/DeepSpeed

版本发布时间: 2024-06-13 02:14:16

microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)

What's Changed

Update version.txt after 0.14.2 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5458
Add getter and setter methods for compile_backend across accelerators. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5299
Fix torch.compile error for PyTorch v2.3 by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5463
Revert "stage3: efficient compute of scaled_global_grad_norm (#5256)" by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5461
Update ds-chat CI workflow paths to include zero stage 1-3 files by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5462
Update with ops not supported on Windows by @loadams in https://github.com/microsoft/DeepSpeed/pull/5468
fix: swapping order of parameters in create_dir_symlink method. by @alvieirajr in https://github.com/microsoft/DeepSpeed/pull/5465
Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5459
re-introduce: stage3: efficient compute of scaled_global_grad_norm by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5493
Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() by @harygo2 in https://github.com/microsoft/DeepSpeed/pull/5464
Fix compile wrapper by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5455
enable phi3_mini autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5501
Fused adam for HPU by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5500
[manifest] update mainfest to add hpp file in csrc. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5522
enable phi2 autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5436
Switch pynvml to nvidia-ml-py by @loadams in https://github.com/microsoft/DeepSpeed/pull/5529
Switch from double quotes to match single quotes by @loadams in https://github.com/microsoft/DeepSpeed/pull/5530
[manifest] update mainfest to add hpp file in deepspeed. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5533
New integration - CometMonitor by @alexkuzmik in https://github.com/microsoft/DeepSpeed/pull/5466
Improve _configure_optimizer() final optimizer log by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5528
Enhance testing: Skip fused_optimizer tests if not supported. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5159
Skip the UT cases that use unimplemented op builders. by @foin6 in https://github.com/microsoft/DeepSpeed/pull/5372
rocblas -> hipblas changes for ROCm by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5401
Rocm warp size fix by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5402
CPUAdam fp16 and bf16 support by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5409
Optimize zero3 fetch params using all_reduce by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5420
Fix the TypeError for XPU Accelerator by @shiyang-weng in https://github.com/microsoft/DeepSpeed/pull/5531
Fix RuntimeError for moe on XPU: tensors found at least two devices by @shiyang-weng in https://github.com/microsoft/DeepSpeed/pull/5519
Remove synchronize calls from allgather params by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5516
Avoid overwrite of compiled module wrapper attributes by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5549
Small typos in functions set_none_gradients_to_zero by @TravelLeraLone in https://github.com/microsoft/DeepSpeed/pull/5557
Adapt doc for #4405 by @oraluben in https://github.com/microsoft/DeepSpeed/pull/5552
Update to HF_HOME from TRANSFORMERS_CACHE by @loadams in https://github.com/microsoft/DeepSpeed/pull/4816
[INF] DSAttention allow input_mask to have false as value by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5546
Add throughput timer configuration by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5363
Add Ulysses DistributedAttention compatibility by @Kwen-Chen in https://github.com/microsoft/DeepSpeed/pull/5525
Add hybrid_engine.py as path to trigger the DS-Chat GH workflow by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5562
Update HPU docker version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5566
Rename files in fp_quantize op from quantize.* to fp_quantize.* by @loadams in https://github.com/microsoft/DeepSpeed/pull/5577
[MiCS] Remove the handle print on DeepSpeed side by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5574
Update to fix sidebar over text by @loadams in https://github.com/microsoft/DeepSpeed/pull/5567
DeepSpeedCheckpoint: support custom final ln idx by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5506
Update minor CUDA version compatibility by @adk9 in https://github.com/microsoft/DeepSpeed/pull/5591
Add slide deck for meetup in Japan by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5598
Fixed the Windows build. by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5596
estimate_zero2_model_states_mem_needs: fixing memory estiamtion by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5099
Fix cuda hardcode for inference woq by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5565
fix sequence parallel(Ulysses) grad scale for zero0 by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5555
Add Compressedbackend for Onebit optimizers by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5473
Updated hpu-gaudi2 tests content. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5622
Pin transformers version for MII tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5629
WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo by @NirSonnenschein in https://github.com/microsoft/DeepSpeed/pull/5590
stage_1_and_2: optimize clip calculation to use clamp by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5632
Fix overlap communication of ZeRO stage 1 and 2 by @penn513 in https://github.com/microsoft/DeepSpeed/pull/5606
fixes in _partition_param_sec function by @mmhab in https://github.com/microsoft/DeepSpeed/pull/5613
assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect by @polisettyvarma in https://github.com/microsoft/DeepSpeed/pull/5569
pipe/_exec_backward_pass: fix immediate grad update by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5605
Monitor was always enabled causing performance degradation by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5633

New Contributors

@alvieirajr made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5465
@harygo2 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5464
@alexkuzmik made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5466
@foin6 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5372
@shiyang-weng made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5531
@TravelLeraLone made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5557
@oraluben made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5552
@Kwen-Chen made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5525
@adk9 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5591
@costin-eseanu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5596
@NirSonnenschein made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5590
@penn513 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5606

Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.2...v0.14.3

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-06-13发行的版本