v0.14.5

microsoft/DeepSpeed

版本发布时间: 2024-08-16 02:04:31

microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)

What's Changed

Update version.txt after 0.14.4 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5694
Fixed Windows inference build. by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5609
Fix memory leak from _hp_mapping by @chiragjn in https://github.com/microsoft/DeepSpeed/pull/5643
Bug fix for the "Link bit16 and fp32 parameters in partition" by @U-rara in https://github.com/microsoft/DeepSpeed/pull/5681
[CPU] add fp16 support to shm inference_all_reduce by @delock in https://github.com/microsoft/DeepSpeed/pull/5669
Universal checkpoint for zero stage 3 by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5475
inference unit test injectionPolicy split world_size to multiple tests by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5687
ENV var added for recaching in INF Unit tests by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5688
Disable nvtx decorator to avoid graph break by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5697
Add an argument to enable the injection of missing state during the conversion of universal checkpoints by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5608
Change source of CPUAdam for xpu accelerator by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5703
Add additional paths to trigger xpu tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5707
Update XPU docker version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5712
update xpu fusedadam opbuilder for pytorch 2.3 by @baodii in https://github.com/microsoft/DeepSpeed/pull/5702
DeepSpeed Universal Checkpointing: Blog and Tutorial by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5711
UCP Chinese Blog by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5713
Fix tutorial links by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5714
Update node16 check on self-hosted runners and remove python 3.6 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5756
fix the missing argument in test and typo by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5730
[INF] Enable torch compile for inference by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5612
Update checkout action for nv-human-eval workflow by @loadams in https://github.com/microsoft/DeepSpeed/pull/5757
Add Windows scripts (deepspeed, ds_report). by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5699
Unit Test: Add error handling for rate limit exceeded in model list by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5715
Fix memory leak for pipelined optimizer swapper by @mauryaavinash95 in https://github.com/microsoft/DeepSpeed/pull/5700
Remove duplicated variable by @xu-song in https://github.com/microsoft/DeepSpeed/pull/5727
Fix phi3 mini 128k load error by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5765
[CPU] Allow deepspeed.comm.inference_all_reduce in torch.compile graph by @delock in https://github.com/microsoft/DeepSpeed/pull/5604
Added wrappers for hpu tensors based on dtype by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5771
[bugfix] promote state in bf16_optimizer by @billishyahao in https://github.com/microsoft/DeepSpeed/pull/5767
Launcher mode with SSH bypass by @dogacancolak-kensho in https://github.com/microsoft/DeepSpeed/pull/5728
Update the list of supported models in the Chinese README of fastgen by @beep-bebop in https://github.com/microsoft/DeepSpeed/pull/5773
Add support for Microsoft Phi-3 model to DeepSpeed-FastGen by @adk9 in https://github.com/microsoft/DeepSpeed/pull/5559
Misplaced global variable warned by @anferico in https://github.com/microsoft/DeepSpeed/pull/5725
Fixes for latest Huggingface_hub changes on modelId -> id by @loadams in https://github.com/microsoft/DeepSpeed/pull/5789
reduce all-to-all communication volume when both expert and non-expert are tensor-parallel by @taozhiwei in https://github.com/microsoft/DeepSpeed/pull/5626
Update Ubuntu version for running python tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5783
fix: quantization with DeepSpeed HE by @Atry in https://github.com/microsoft/DeepSpeed/pull/5624
[INF] Add Qwen2RMSNorm to loaded layers in auto_tp by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5786
Add chatglm2 & chatglm3 autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5540
Add new autotp supported model in doc by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5785
Fix accuracy error of NPUFusedAdam by @penn513 in https://github.com/microsoft/DeepSpeed/pull/5777
Update torch version in cpu-torch-latest and nv-torch-latest-v100 tests to 2.4 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5797
move is_checkpointable call reducing torch.compile Graph breaks by @NirSonnenschein in https://github.com/microsoft/DeepSpeed/pull/5759
Unpin transformers version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5650
Update other workflows to run on Ubuntu 22.04 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5798
[XPU]Use host time to replace xpu time when IPEX version slower than 2.5. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5796
Update MII tests to pull correct torchvision by @loadams in https://github.com/microsoft/DeepSpeed/pull/5800
Add fp8-fused gemm kernel by @sfc-gh-reyazda in https://github.com/microsoft/DeepSpeed/pull/5764
Add doc of compressed backend in Onebit optimizers by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5782
fix: handle exception when loading cache file in test_inference.py by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5802
Pin transformers version for MII tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5807
Fix op_builder for CUDA 12.5 by @keshavkowshik in https://github.com/microsoft/DeepSpeed/pull/5806
Find ROCm on Fedora by @trixirt in https://github.com/microsoft/DeepSpeed/pull/5705
Fix CPU Adam JIT compilation by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5780
GDS AIO Blog by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/5817
[ROCm] Get rocm version from /opt/rocm/.info/version by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5815
sequence parallel with communication overlap by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5691
Update to ROCm6 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5491
Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen by @ZonePG in https://github.com/microsoft/DeepSpeed/pull/5403
Use accelerator to replace cuda in setup and runner by @Andy666G in https://github.com/microsoft/DeepSpeed/pull/5769
Link GDS blog to site by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5820
Non-reentrant checkpointing hook fix by @ic-synth in https://github.com/microsoft/DeepSpeed/pull/5781
Fix NV references by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5821
Fix docs building guide by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5825
Update clang-format version from 16 to 18. by @loadams in https://github.com/microsoft/DeepSpeed/pull/5839
Add Japanese translation of DeepNVMe blog by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5845
Fix the bug of deepspeed sequence parallel working with batch size larger than 1 by @YJHMITWEB in https://github.com/microsoft/DeepSpeed/pull/5823
Upgrade HPU image to v1.16.2. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5610
OptimizedLinear updates by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5791
Log operator warnings only in verbose mode by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5917
Use torch.nan_to_num replace numpy wrapper one by @jinyouzhi in https://github.com/microsoft/DeepSpeed/pull/5877
[Zero2] Reduce the unnecessary all-reduce when tensor size is 0. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5868
Update container version for Gaudi2 CI by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5937
Fix missing ds_id bug by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5824
Update LR scheduler configuration by @xiyang-aads-lilly in https://github.com/microsoft/DeepSpeed/pull/5846
HPUAccelerator: remove support in set_visible_devices_envs by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5929
Z3: optimizations for grad norm calculation and gradient clipping by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5504
Update xpu-max1100.yml with new config and add some tests by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5668
Add accelerator setup guides by @delock in https://github.com/microsoft/DeepSpeed/pull/5827
Allow accelerator to instantiate the device by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5255

New Contributors

@U-rara made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5681
@xylian86 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5475
@mauryaavinash95 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5700
@billishyahao made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5767
@dogacancolak-kensho made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5728
@beep-bebop made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5773
@anferico made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5725
@Atry made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5624
@sfc-gh-reyazda made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5764
@keshavkowshik made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5806
@trixirt made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5705
@Andy666G made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5769
@ic-synth made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5781
@xiyang-aads-lilly made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5846

Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.4...v0.14.5

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-08-16发行的版本