v0.1.7
版本发布时间: 2022-06-21 12:10:42
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
Version v0.1.7 Released Today
Highlights
- Started torch.fx for auto-parallel training
- Update the zero mechanism with ColoTensor
- Fixed various bugs
What's Changed
Hotfix
- [hotfix] prevent nested ZeRO (#1140) by ver217
- [hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
- [hotfix] fix param op hook (#1131) by ver217
- [hotfix] fix zero init ctx numel (#1128) by ver217
- [hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
- [hotfix] fix chunk comm src rank (#1072) by ver217
Zero
- [zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
- [zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
- [zero] fixed api consistency (#1098) by Frank Lee
- [zero] zero optim copy chunk rather than copy tensor (#1070) by ver217
Optim
- [optim] refactor fused sgd (#1134) by ver217
Ddp
- [ddp] add save/load state dict for ColoDDP (#1127) by ver217
- [ddp] add set_params_to_ignore for ColoDDP (#1122) by ver217
- [ddp] supported customized torch ddp configuration (#1123) by Frank Lee
Pipeline
- [pipeline]support List of Dict data (#1125) by YuliangLiu0306
- [pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
- [pipeline] refactor the pipeline module (#1087) by Frank Lee
Fx
- [fx]add autoparallel passes (#1121) by YuliangLiu0306
- [fx] added unit test for coloproxy (#1119) by Frank Lee
- [fx] added coloproxy (#1115) by Frank Lee
Gemini
- [gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
- [gemini] zero supports gemini (#1093) by ver217
Test
- [test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
- [test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
- [test] ignore 8 gpu test (#1080) by Frank Lee
Release
- [release] update version.txt (#1103) by Frank Lee
Tensor
- [tensor] refactor param op hook (#1097) by ver217
- [tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
- [Tensor] fix equal assert (#1091) by Ziyue Jiang
- [Tensor] 1d row embedding (#1075) by Ziyue Jiang
- [tensor] chunk manager monitor mem usage (#1076) by ver217
- [Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
- [Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang
Amp
- [amp] included dict for type casting of model output (#1102) by Frank Lee
Workflow
- [workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
- [workflow] added regular 8 GPU testing (#1099) by Frank Lee
- [workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee
Engine
- [engine] fixed empty op hook check (#1096) by Frank Lee
Doc
- [doc] added documentation to chunk and chunk manager (#1094) by Frank Lee
Context
- [context] support lazy init of module (#1088) by Frank Lee
- [context] maintain the context object in with statement (#1073) by Frank Lee
Refactory
- [refactory] add nn.parallel module (#1068) by Jiarui Fang
Cudnn
- [cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.7...v0.1.6