v0.1.6
版本发布时间: 2022-06-02 14:31:03
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
Main features
- ColoTensor supports hybrid parallel (tensor parallel and data parallel)
- ColoTensor supports ZeRO (with chunk)
- Config tensor parallel by module via ColoTensor
- ZeroInitContext and ShardedModelV2 support loading checkpoint and hugging face
from_pretrain()
What's Changed
ColoTensor
- [tensor] refactor colo-tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/992
- [tensor] refactor parallel action by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1007
- [tensor] impl ColoDDP for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1009
- [Tensor] add module handler for linear by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1021
- [Tensor] add module check and bert test by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1031
- [Tensor] add Parameter inheritance for ColoParameter by @Wesley-Jzy in https://github.com/hpcaitech/ColossalAI/pull/1041
- [tensor] ColoTensor supports ZeRo by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1015
- [zero] add chunk size search for chunk manager by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1052
Zero
- [zero] add load_state_dict for sharded model by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/894
- [zero] add zero optimizer for ColoTensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1046
Hotfix
- [hotfix] fix colo init context by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1026
- [hotfix] fix some bugs caused by size mismatch. by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1011
- [kernel] fixed the include bug in dropout kernel by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/999
- fix typo in constants by @ryanrussell in https://github.com/hpcaitech/ColossalAI/pull/1027
- [engine] fixed bug in gradient accumulation dataloader to keep the last step by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1030
- [hotfix] fix dist spec mgr by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1045
- [hotfix] fix import error in sharded model v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1053
Unit test
- [unit test] refactor test tensor by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/1005
CI
- [ci] update the docker image name by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1017
- [ci] added nightly build (#1018) by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1019
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1022
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1029
- [ci] fixed nightly build workflow by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1040
CLI
- [cli] remove unused imports by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1001
Documentation
- Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/987
- [doc] update docker instruction by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1020
Misc
- [NFC] Hotfix/format by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/984
- Revert "[NFC] Hotfix/format" by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/986
- remove useless import in tensor dir by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/997
- [NFC] fix download link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/998
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1003
- [NFC] polish colossalai/kernel/cuda_native/csrc/colossal_C_frontend.c… by @zhengzangw in https://github.com/hpcaitech/ColossalAI/pull/1010
- [NFC] fix paper link by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1012
- [p2p]add object list send/recv by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1024
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1034
- [NFC] add inference by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1044
- [titans]remove model zoo by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1042
- [NFC] add inference submodule in path by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/1047
- [release] update version.txt by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/1048
- [Bot] Synchronize Submodule References by @github-actions in https://github.com/hpcaitech/ColossalAI/pull/1049
- updated collective ops api by @kurisusnowdeng in https://github.com/hpcaitech/ColossalAI/pull/1054
- [pipeline]refactor ppschedule to support tensor list by @YuliangLiu0306 in https://github.com/hpcaitech/ColossalAI/pull/1050
New Contributors
- @ryanrussell made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/1027
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.5...v0.1.6