v0.14.1

microsoft/DeepSpeed

版本发布时间: 2024-04-16 03:51:26

microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)

What's Changed

Update version.txt after 0.14.0 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5238
Fp6 blog chinese by @xiaoxiawu-microsoft in https://github.com/microsoft/DeepSpeed/pull/5239
Add contributed HW support into README by @delock in https://github.com/microsoft/DeepSpeed/pull/5240
Set tp world size to 1 in ckpt load, if MPU is not provided by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5243
Make op builder detection adapt to accelerator change by @delock in https://github.com/microsoft/DeepSpeed/pull/5206
Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5264
Add CI for Habana Labs HPU/Gaudi2 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5244
Fix attention mask handling in the Hybrid Engine Bloom flow by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5101
Skip 1Bit Compression and sparsegrad tests for HPU. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5270
Enabled LMCorrectness inference tests on HPU. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5271
Added HPU backend support for torch.compile tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5269
Average only valid part of the ipg buffer. by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5268
Add HPU accelerator support in unit tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5162
Fix loading a universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5263
Add Habana Gaudi2 CI badge to the README by @loadams in https://github.com/microsoft/DeepSpeed/pull/5286
Add intel gaudi to contributed HW in README by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5300
Fixed Accelerate Link by @wkaisertexas in https://github.com/microsoft/DeepSpeed/pull/5314
Enable mixtral 8x7b autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5257
support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5259
fix comms dtype by @mayank31398 in https://github.com/microsoft/DeepSpeed/pull/5297
Modified regular expression by @igeni in https://github.com/microsoft/DeepSpeed/pull/5306
Docs typos fix and grammar suggestions by @Gr0g0 in https://github.com/microsoft/DeepSpeed/pull/5322
Added Gaudi2 CI tests. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5275
Improve universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5289
Increase coverage for HPU by @loadams in https://github.com/microsoft/DeepSpeed/pull/5324
Add NFS path check for default deepspeed triton cache directory by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5323
Correct typo in checking on bf16 unit test support by @loadams in https://github.com/microsoft/DeepSpeed/pull/5317
Make NFS warning print only once by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5345
resolve KeyError: 'PDSH_SSH_ARGS_APPEND' by @Lzhang-hub in https://github.com/microsoft/DeepSpeed/pull/5318
BF16 optimizer: Clear lp grads after updating hp grads in hook by @YangQun1 in https://github.com/microsoft/DeepSpeed/pull/5328
Fix sort of zero checkpoint files by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5342
Add distributed_port for deepspeed.initialize by @LZHgrla in https://github.com/microsoft/DeepSpeed/pull/5260
[fix] fix typo s/simultanenously /simultaneously by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/5359
Update container version for Gaudi2 CI by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5360
compute global norm on device by @BacharL in https://github.com/microsoft/DeepSpeed/pull/5125
logger update with torch master changes by @rogerxfeng8 in https://github.com/microsoft/DeepSpeed/pull/5346
Ensure capacity does not exceed number of tokens by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5353
Update workflows that use cu116 to cu117 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5361
FP [6,8,12] quantizer op by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5336
CPU SHM based inference_all_reduce improve by @delock in https://github.com/microsoft/DeepSpeed/pull/5320
Auto convert moe param groups by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5354
Support MoE for pipeline models by @mosheisland in https://github.com/microsoft/DeepSpeed/pull/5338
Update pytest and transformers with fixes for pytest>= 8.0.0 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5164
Increase CI coverage for Gaudi2 accelerator. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5358
Add CI for Intel XPU/Max1100 by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5376
Update path name on xpu-max1100.yml, add badge in README by @loadams in https://github.com/microsoft/DeepSpeed/pull/5386
Update checkout action on workflows on ubuntu 20.04 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5387
Cleanup required_torch_version code and references. by @loadams in https://github.com/microsoft/DeepSpeed/pull/5370
Update README.md for intel XPU support by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5389
Optimize the fp-dequantizer to get high memory-BW utilization by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/5373
Removal of cuda hardcoded string with get_device function by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5351
Add custom reshaping for universal checkpoint by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5390
fix pagable h2d memcpy by @GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/5301
stage3: efficient compute of scaled_global_grad_norm by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5256
Fix the FP6 kernels compilation problem on non-Ampere GPUs. by @JamesTheZ in https://github.com/microsoft/DeepSpeed/pull/5333

New Contributors

@vshekhawat-hlab made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5270
@wkaisertexas made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5314
@igeni made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5306
@Gr0g0 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5322
@Lzhang-hub made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5318
@YangQun1 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5328
@raza-sikander made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5360
@rogerxfeng8 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5346
@JamesTheZ made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5333

Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.0...v0.14.1

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-04-16发行的版本