v0.14.1
版本发布时间: 2024-04-16 03:51:26
microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)
What's Changed
- Update version.txt after 0.14.0 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5238
- Fp6 blog chinese by @xiaoxiawu-microsoft in https://github.com/microsoft/DeepSpeed/pull/5239
- Add contributed HW support into README by @delock in https://github.com/microsoft/DeepSpeed/pull/5240
- Set tp world size to 1 in ckpt load, if MPU is not provided by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5243
- Make op builder detection adapt to accelerator change by @delock in https://github.com/microsoft/DeepSpeed/pull/5206
- Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5264
- Add CI for Habana Labs HPU/Gaudi2 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5244
- Fix attention mask handling in the Hybrid Engine Bloom flow by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5101
- Skip 1Bit Compression and sparsegrad tests for HPU. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5270
- Enabled LMCorrectness inference tests on HPU. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5271
- Added HPU backend support for torch.compile tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5269
- Average only valid part of the ipg buffer. by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5268
- Add HPU accelerator support in unit tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5162
- Fix loading a universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5263
- Add Habana Gaudi2 CI badge to the README by @loadams in https://github.com/microsoft/DeepSpeed/pull/5286
- Add intel gaudi to contributed HW in README by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5300
- Fixed Accelerate Link by @wkaisertexas in https://github.com/microsoft/DeepSpeed/pull/5314
- Enable mixtral 8x7b autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5257
- support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5259
- fix comms dtype by @mayank31398 in https://github.com/microsoft/DeepSpeed/pull/5297
- Modified regular expression by @igeni in https://github.com/microsoft/DeepSpeed/pull/5306
- Docs typos fix and grammar suggestions by @Gr0g0 in https://github.com/microsoft/DeepSpeed/pull/5322
- Added Gaudi2 CI tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5275
- Improve universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5289
- Increase coverage for HPU by @loadams in https://github.com/microsoft/DeepSpeed/pull/5324
- Add NFS path check for default deepspeed triton cache directory by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5323
- Correct typo in checking on bf16 unit test support by @loadams in https://github.com/microsoft/DeepSpeed/pull/5317
- Make NFS warning print only once by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5345
- resolve KeyError: 'PDSH_SSH_ARGS_APPEND' by @Lzhang-hub in https://github.com/microsoft/DeepSpeed/pull/5318
- BF16 optimizer: Clear lp grads after updating hp grads in hook by @YangQun1 in https://github.com/microsoft/DeepSpeed/pull/5328
- Fix sort of zero checkpoint files by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5342
- Add
distributed_port
fordeepspeed.initialize
by @LZHgrla in https://github.com/microsoft/DeepSpeed/pull/5260 - [fix] fix typo s/simultanenously /simultaneously by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/5359
- Update container version for Gaudi2 CI by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5360
- compute global norm on device by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5125
- logger update with torch master changes by @rogerxfeng8 in https://github.com/microsoft/DeepSpeed/pull/5346
- Ensure capacity does not exceed number of tokens by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5353
- Update workflows that use cu116 to cu117 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5361
- FP [6,8,12] quantizer op by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5336
- CPU SHM based inference_all_reduce improve by @delock in https://github.com/microsoft/DeepSpeed/pull/5320
- Auto convert moe param groups by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5354
- Support MoE for pipeline models by @mosheisland in https://github.com/microsoft/DeepSpeed/pull/5338
- Update pytest and transformers with fixes for pytest>= 8.0.0 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5164
- Increase CI coverage for Gaudi2 accelerator. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5358
- Add CI for Intel XPU/Max1100 by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5376
- Update path name on xpu-max1100.yml, add badge in README by @loadams in https://github.com/microsoft/DeepSpeed/pull/5386
- Update checkout action on workflows on ubuntu 20.04 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5387
- Cleanup required_torch_version code and references. by @loadams in https://github.com/microsoft/DeepSpeed/pull/5370
- Update README.md for intel XPU support by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5389
- Optimize the fp-dequantizer to get high memory-BW utilization by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/5373
- Removal of cuda hardcoded string with get_device function by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5351
- Add custom reshaping for universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5390
- fix pagable h2d memcpy by @GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/5301
- stage3: efficient compute of scaled_global_grad_norm by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5256
- Fix the FP6 kernels compilation problem on non-Ampere GPUs. by @JamesTheZ in https://github.com/microsoft/DeepSpeed/pull/5333
New Contributors
- @vshekhawat-hlab made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5270
- @wkaisertexas made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5314
- @igeni made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5306
- @Gr0g0 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5322
- @Lzhang-hub made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5318
- @YangQun1 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5328
- @raza-sikander made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5360
- @rogerxfeng8 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5346
- @JamesTheZ made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5333
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.0...v0.14.1