v0.14.5
版本发布时间: 2024-08-16 02:04:31
microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)
What's Changed
- Update version.txt after 0.14.4 release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5694
- Fixed Windows inference build. by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5609
- Fix memory leak from _hp_mapping by @chiragjn in https://github.com/microsoft/DeepSpeed/pull/5643
- Bug fix for the "Link bit16 and fp32 parameters in partition" by @U-rara in https://github.com/microsoft/DeepSpeed/pull/5681
- [CPU] add fp16 support to shm inference_all_reduce by @delock in https://github.com/microsoft/DeepSpeed/pull/5669
- Universal checkpoint for zero stage 3 by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5475
- inference unit test injectionPolicy split world_size to multiple tests by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5687
- ENV var added for recaching in INF Unit tests by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5688
- Disable nvtx decorator to avoid graph break by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5697
- Add an argument to enable the injection of missing state during the conversion of universal checkpoints by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5608
- Change source of CPUAdam for xpu accelerator by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5703
- Add additional paths to trigger xpu tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5707
- Update XPU docker version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5712
- update xpu fusedadam opbuilder for pytorch 2.3 by @baodii in https://github.com/microsoft/DeepSpeed/pull/5702
- DeepSpeed Universal Checkpointing: Blog and Tutorial by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5711
- UCP Chinese Blog by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5713
- Fix tutorial links by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5714
- Update node16 check on self-hosted runners and remove python 3.6 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5756
- fix the missing argument in test and typo by @xylian86 in https://github.com/microsoft/DeepSpeed/pull/5730
- [INF] Enable torch compile for inference by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5612
- Update checkout action for nv-human-eval workflow by @loadams in https://github.com/microsoft/DeepSpeed/pull/5757
- Add Windows scripts (deepspeed, ds_report). by @costin-eseanu in https://github.com/microsoft/DeepSpeed/pull/5699
- Unit Test: Add error handling for rate limit exceeded in model list by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5715
- Fix memory leak for pipelined optimizer swapper by @mauryaavinash95 in https://github.com/microsoft/DeepSpeed/pull/5700
- Remove duplicated variable by @xu-song in https://github.com/microsoft/DeepSpeed/pull/5727
- Fix phi3 mini 128k load error by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5765
- [CPU] Allow deepspeed.comm.inference_all_reduce in torch.compile graph by @delock in https://github.com/microsoft/DeepSpeed/pull/5604
- Added wrappers for hpu tensors based on dtype by @deepcharm in https://github.com/microsoft/DeepSpeed/pull/5771
- [bugfix] promote state in bf16_optimizer by @billishyahao in https://github.com/microsoft/DeepSpeed/pull/5767
- Launcher mode with SSH bypass by @dogacancolak-kensho in https://github.com/microsoft/DeepSpeed/pull/5728
- Update the list of supported models in the Chinese README of fastgen by @beep-bebop in https://github.com/microsoft/DeepSpeed/pull/5773
- Add support for Microsoft Phi-3 model to DeepSpeed-FastGen by @adk9 in https://github.com/microsoft/DeepSpeed/pull/5559
- Misplaced global variable
warned
by @anferico in https://github.com/microsoft/DeepSpeed/pull/5725 - Fixes for latest Huggingface_hub changes on modelId -> id by @loadams in https://github.com/microsoft/DeepSpeed/pull/5789
- reduce all-to-all communication volume when both expert and non-expert are tensor-parallel by @taozhiwei in https://github.com/microsoft/DeepSpeed/pull/5626
- Update Ubuntu version for running python tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5783
- fix: quantization with DeepSpeed HE by @Atry in https://github.com/microsoft/DeepSpeed/pull/5624
- [INF] Add Qwen2RMSNorm to loaded layers in auto_tp by @oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5786
- Add chatglm2 & chatglm3 autotp by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5540
- Add new autotp supported model in doc by @Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5785
- Fix accuracy error of NPUFusedAdam by @penn513 in https://github.com/microsoft/DeepSpeed/pull/5777
- Update torch version in cpu-torch-latest and nv-torch-latest-v100 tests to 2.4 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5797
- move is_checkpointable call reducing torch.compile Graph breaks by @NirSonnenschein in https://github.com/microsoft/DeepSpeed/pull/5759
- Unpin transformers version by @loadams in https://github.com/microsoft/DeepSpeed/pull/5650
- Update other workflows to run on Ubuntu 22.04 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5798
- [XPU]Use host time to replace xpu time when IPEX version slower than 2.5. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5796
- Update MII tests to pull correct torchvision by @loadams in https://github.com/microsoft/DeepSpeed/pull/5800
- Add fp8-fused gemm kernel by @sfc-gh-reyazda in https://github.com/microsoft/DeepSpeed/pull/5764
- Add doc of compressed backend in Onebit optimizers by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5782
- fix: handle exception when loading cache file in test_inference.py by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5802
- Pin transformers version for MII tests by @loadams in https://github.com/microsoft/DeepSpeed/pull/5807
- Fix op_builder for CUDA 12.5 by @keshavkowshik in https://github.com/microsoft/DeepSpeed/pull/5806
- Find ROCm on Fedora by @trixirt in https://github.com/microsoft/DeepSpeed/pull/5705
- Fix CPU Adam JIT compilation by @lekurile in https://github.com/microsoft/DeepSpeed/pull/5780
- GDS AIO Blog by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/5817
- [ROCm] Get rocm version from /opt/rocm/.info/version by @rraminen in https://github.com/microsoft/DeepSpeed/pull/5815
- sequence parallel with communication overlap by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/5691
- Update to ROCm6 by @loadams in https://github.com/microsoft/DeepSpeed/pull/5491
- Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen by @ZonePG in https://github.com/microsoft/DeepSpeed/pull/5403
- Use accelerator to replace cuda in setup and runner by @Andy666G in https://github.com/microsoft/DeepSpeed/pull/5769
- Link GDS blog to site by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5820
- Non-reentrant checkpointing hook fix by @ic-synth in https://github.com/microsoft/DeepSpeed/pull/5781
- Fix NV references by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5821
- Fix docs building guide by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5825
- Update clang-format version from 16 to 18. by @loadams in https://github.com/microsoft/DeepSpeed/pull/5839
- Add Japanese translation of DeepNVMe blog by @tohtana in https://github.com/microsoft/DeepSpeed/pull/5845
- Fix the bug of deepspeed sequence parallel working with batch size larger than 1 by @YJHMITWEB in https://github.com/microsoft/DeepSpeed/pull/5823
- Upgrade HPU image to v1.16.2. by @vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5610
- OptimizedLinear updates by @jeffra in https://github.com/microsoft/DeepSpeed/pull/5791
- Log operator warnings only in verbose mode by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5917
- Use
torch.nan_to_num
replace numpy wrapper one by @jinyouzhi in https://github.com/microsoft/DeepSpeed/pull/5877 - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/5868
- Update container version for Gaudi2 CI by @raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5937
- Fix missing ds_id bug by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/5824
- Update LR scheduler configuration by @xiyang-aads-lilly in https://github.com/microsoft/DeepSpeed/pull/5846
- HPUAccelerator: remove support in set_visible_devices_envs by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5929
- Z3: optimizations for grad norm calculation and gradient clipping by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5504
- Update xpu-max1100.yml with new config and add some tests by @Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5668
- Add accelerator setup guides by @delock in https://github.com/microsoft/DeepSpeed/pull/5827
- Allow accelerator to instantiate the device by @nelyahu in https://github.com/microsoft/DeepSpeed/pull/5255
New Contributors
- @U-rara made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5681
- @xylian86 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5475
- @mauryaavinash95 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5700
- @billishyahao made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5767
- @dogacancolak-kensho made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5728
- @beep-bebop made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5773
- @anferico made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5725
- @Atry made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5624
- @sfc-gh-reyazda made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5764
- @keshavkowshik made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5806
- @trixirt made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5705
- @Andy666G made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5769
- @ic-synth made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5781
- @xiyang-aads-lilly made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5846
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.14.4...v0.14.5