v0.1.2
版本发布时间: 2022-04-06 13:45:26
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
Overview
Here are the main improvements of this release:
- MOE and BERT models can be trained with ZeRO.
- Provide a uniform checkpoint for all kinds of parallelism.
- Optimize ZeRO-offload, and improve model scaling.
- Design a uniform model memory tracer.
- Implement an efficient hybrid Adam (CPU and CUDA kernels).
- Improve activation offloading.
- Profiler TensorBoard plugin of Beta version.
- Refactor pipeline module for closer integration with engine.
- Chinese tutorials, WeChat and Slack user groups.
What's Changed
Features
- [zero] get memory usage for sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/536
- [zero] improve the accuracy of get_memory_usage of sharded param by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/538
- [zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/537
- [zero] get memory usage of sharded optim v2. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/542
- [zero] polish ZeroInitContext by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/540
- [zero] optimize grad offload by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/539
- [zero] non model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/545
- [zero] add zero config to neutralize zero context init by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/546
- [zero] dump memory stats for sharded model by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/548
- [zero] add stateful tensor by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/549
- [zero] label state for param fp16 and grad by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/551
- [zero] hijack p.grad in sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/554
- [utils] update colo tensor moving APIs by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/553
- [polish] rename col_attr -> colo_attr by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/558
- [zero] trace states of fp16/32 grad and fp32 param by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/571
- [zero] adapt zero for unsharded parameters by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/561
- [refactor] memory utils by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/577
- Feature/checkpoint gloo by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/589
- [zero] add sampling time for memstats collector by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/610
- [model checkpoint] checkpoint utils by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/592
- [model checkpoint][hotfix] unified layers for save&load by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/593
- Feature/checkpoint 2D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/595
- Feature/checkpoint 1D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/594
- [model checkpoint] CPU communication ops by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/590
- Feature/checkpoint 2.5D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/596
- Feature/Checkpoint 3D by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/597
- [model checkpoint] checkpoint hook by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/598
- Feature/Checkpoint tests by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/599
- [zero] adapt zero for unsharded parameters (Optimizer part) by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/601
- [zero] polish init context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/645
- refactor pipeline---put runtime schedule into engine. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/627
Bug Fix
- [Zero] process no-leaf-module in Zero by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/535
- Add gather_out arg to Linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/541
- [hoxfix] fix parallel_input flag for Linear1D_Col gather_output by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/579
- [hotfix] add hybrid adam to init by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/584
- Hotfix/path check util by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/591
- [hotfix] fix sharded optim zero grad by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/604
- Add tensor parallel input check by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/621
- [hotfix] Raise messages for indivisible batch sizes with tensor parallelism by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/622
- [zero] fixed the activation offload by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/647
- fixed bugs in CPU adam by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/633
- Revert "[zero] polish init context" by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/657
- [hotfix] fix a bug in model data stats tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/655
- fix bugs for unsharded parameters when restore data by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/664
Unit Testing
- [zero] test zero tensor utils by @FredHuang99 in https://github.com/hpcaitech/ColossalAI/pull/609
- remove hybrid adam in test_moe_zero_optim by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/659
Documentation
- Refactored docstring to google style by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/532
- [docs] updatad docs of hybrid adam and cpu adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/552
- html refactor by @number1roy in https://github.com/hpcaitech/ColossalAI/pull/555
- [doc] polish docstring of zero by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/612
- [doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/615
- [doc] polish amp docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/616
- [doc] polish moe docsrting by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/618
- [doc] polish optimizer docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/619
- [doc] polish utils docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/620
- [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu … by @GaryGky in https://github.com/hpcaitech/ColossalAI/pull/625
- [doc] polish checkpoint docstring by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/637
- update GPT-2 experiment result by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/666
- [NFC] polish code by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/646
Model Zoo
- [model zoo] add activation offload for gpt model by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/582
Miscellaneous
- [logging] polish logger format by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/543
- [profiler] add MemProfiler by @raejaf in https://github.com/hpcaitech/ColossalAI/pull/356
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/501
- [tool] create .clang-format for pre-commit by @BoxiangW in https://github.com/hpcaitech/ColossalAI/pull/578
- [GitHub] Add prefix and label in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/652
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.1...v0.1.2