v0.7.4
版本发布时间: 2022-10-22 05:11:28
microsoft/DeepSpeed最新发布版本:v0.15.1(2024-09-05 09:30:51)
What's Changed
- MOE residual matmult unit test by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/2323
- MOE matmult with memaccess by @samadejacobs in https://github.com/microsoft/DeepSpeed/pull/2336
- Refactor residual add kernels by @arashb in https://github.com/microsoft/DeepSpeed/pull/2333
- mem access for quantize kernel by @GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/2331
- increase min pre-commit versions by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2346
- Extend scratch buffer for long prompts by @cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2212
- [docs] fix zero docs by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2350
- Staging profile inference v1 (#2348) by @awan-10 in https://github.com/microsoft/DeepSpeed/pull/2349
- Kernel Data Conversion Utility by @cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2327
- Add Onebit Optimizers in init by @l4d2boomer in https://github.com/microsoft/DeepSpeed/pull/2340
- docs(mixture-of-experts-inference): fix typo in tuto by @jqueguiner in https://github.com/microsoft/DeepSpeed/pull/2345
- Use blob storage for datasets in unit tests by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2342
- Refactor
gptj_residual_add
kernels for better readability by @arashb in https://github.com/microsoft/DeepSpeed/pull/2358 - Updated issue templates by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2363
- fix cuda invalid config error in dequant kernel by @GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/2362
- Add missing pytest fixture scope by @arashb in https://github.com/microsoft/DeepSpeed/pull/2353
- Extend residual_add kernel tests to cover pre_attn_norm by @arashb in https://github.com/microsoft/DeepSpeed/pull/2354
- Refactor
fused_bias_residual
kernels for better readability by @arashb in https://github.com/microsoft/DeepSpeed/pull/2356 - Capture error message during sweep tests by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/2351
- Fix an exception when auto-casting dicts to fp16 by @mjksmith in https://github.com/microsoft/DeepSpeed/pull/2370
- Refactor remaining distributed tests by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2216
- Fix the MLP output tensor's shape by @arashb in https://github.com/microsoft/DeepSpeed/pull/2380
- add 11.8 to cuda_minor_mismatch_ok to allow building with current CUDA by @Thomas-MMJ in https://github.com/microsoft/DeepSpeed/pull/2390
- Pin Transformers test version by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2402
- Change type to tuple in replace_wo_policy isinstance check by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2387
- Checkpoint backwards-compatbility workaround by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/2384
- Add Predicated Global Load to Memory Access Utils by @cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2373
- MII blog post by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2418
- Fix figure reference by @awan-10 in https://github.com/microsoft/DeepSpeed/pull/2419
- Add SLURM Multinode Runner by @dashstander in https://github.com/microsoft/DeepSpeed/pull/2404
- Fix issue with corrupted output on long generation for GPT by @andrewchernyh in https://github.com/microsoft/DeepSpeed/pull/2359
- Fix GPT Neo-X multi-gpu inference by @andrewchernyh in https://github.com/microsoft/DeepSpeed/pull/2401
- CI fixes related to triton by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2422
- [docs] update mii blog title by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2423
- add SD injection policy by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2381
- Fix checkpoint loading when it is a dictionary by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2425
- Make error regex more generic in collect_results.py by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/2415
- fixes #2389 by @clumsy in https://github.com/microsoft/DeepSpeed/pull/2411
- Fix for inference gpt-j test by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2430
- Fixing bug 2361 by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/2410
- Universal checkpoint for zero stage 1 by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/2284
- only add deps if extra is explicitly called by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2432
- Add TestInjectionPolicy inference unittest class for testing custom injection policies by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2426
- [memory estimators] new config args sync by @stas00 in https://github.com/microsoft/DeepSpeed/pull/2431
- parallelize writing of layer checkpoint files across data parallel instances by @adammoody in https://github.com/microsoft/DeepSpeed/pull/1419
- Fix broken link to DeepSpeed Megatron fork by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2440
New Contributors
- @l4d2boomer made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2340
- @jqueguiner made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2345
- @mjksmith made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2370
- @Thomas-MMJ made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2390
- @lekurile made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2387
- @dashstander made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2404
- @andrewchernyh made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2359
- @clumsy made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2411
- @jomayeri made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2410
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.7.3...v0.7.4