v0.1.1
版本发布时间: 2022-03-26 15:19:42
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
What's Changed
Features
- [MOE] changed parallelmode to dist process group by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/460
- [MOE] redirect moe_env from global_variables to core by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/467
- [zero] zero init ctx receives a dp process group by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/471
- [zero] ZeRO supports pipeline parallel by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/477
- add LinearGate for MOE in NaiveAMP context by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/480
- [zero] polish sharded param name by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/484
- [zero] sharded optim support hybrid cpu adam by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/486
- [zero] polish sharded optimizer v2 by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/490
- [MOE] support PR-MOE by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/488
- [zero] sharded model manages ophooks individually by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/492
- [MOE] remove old MoE legacy by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/493
- [zero] sharded model support the reuse of fp16 shard by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/495
- [polish] polish singleton and global context by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/500
- [memory] add model data tensor moving api by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/503
- [memory] set cuda mem frac by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/506
- [zero] use colo model data api in sharded optimv2 by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/511
- [MOE] add MOEGPT model by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/510
- [zero] zero init ctx enable rm_torch_payload_on_the_fly by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/512
- [zero] show model data cuda memory usage after zero context init. by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/515
- [log] polish disable_existing_loggers by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/519
- [zero] add model data tensor inline moving API by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/521
- [cuda] modify the fused adam, support hybrid of fp16 and fp32 by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/497
- [zero] refactor model data tracing by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/522
- [zero] added hybrid adam, removed loss scale in adam by @Gy-Lu in https://github.com/hpcaitech/ColossalAI/pull/527
Bug Fix
- fix discussion buttom in issue template by @binmakeswell in https://github.com/hpcaitech/ColossalAI/pull/504
- [zero] fix grad offload by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/528
Unit Testing
- [MOE] add unitest for MOE experts layout, gradient handler and kernel by @1SAA in https://github.com/hpcaitech/ColossalAI/pull/469
- [test] added rerun on exception for testing by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/475
- [zero] fix init device bug in zero init context unittest by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/516
- [test] fixed rerun_on_exception and adapted test cases by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/487
CI/CD
- [devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/505
- [devops] remove tsinghua source for pip by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/507
- [devops] recover tsinghua pip source due to proxy issue by @FrankLeeeee in https://github.com/hpcaitech/ColossalAI/pull/509
Documentation
- [doc] update rst by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/470
- Update Experiment result about Colossal-AI with ZeRO by @Sze-qq in https://github.com/hpcaitech/ColossalAI/pull/479
- [doc] docs get correct release version by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/489
- Update README.md by @fastalgo in https://github.com/hpcaitech/ColossalAI/pull/514
- [doc] update apidoc by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/530
Model Zoo
- [model zoo] fix attn mask shape of gpt by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/472
- [model zoo] gpt embedding remove attn mask by @ver217 in https://github.com/hpcaitech/ColossalAI/pull/474
Miscellaneous
- [install] run with out rich by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/513
- [refactor] remove old zero code by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/517
- [format] polish name format for MOE by @feifeibear in https://github.com/hpcaitech/ColossalAI/pull/481
New Contributors
- @fastalgo made their first contribution in https://github.com/hpcaitech/ColossalAI/pull/514
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.0...v0.1.1