v0.1.9
版本发布时间: 2022-08-11 21:16:46
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
What's Changed
Zero
- [zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
- [zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
- [zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
- [zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
- [zero] add AgChunk (#1417) by HELSON
- [zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
- [zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
- [zero] chunk manager allows filtering ex-large params (#1393) by ver217
- [zero] zero optim state_dict takes only_rank_0 (#1384) by ver217
Fx
- [fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
- [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
- [fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
- [fx] patched torch.max and data movement operator (#1391) by Frank Lee
- [fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
- [fx] patched torch.full for huggingface opt (#1386) by Frank Lee
- [fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
- [fx] add torchaudio test (#1369) by Super Daniel
- [fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
- [fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
- [fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
- [fx] added activation checkpoint codegen (#1355) by Frank Lee
- [fx] fixed apex normalization patch exception (#1352) by Frank Lee
- [fx] added activation checkpointing annotation (#1349) by Frank Lee
- [fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
- [fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
- [fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
- [fx]refactor tracer (#1335) by YuliangLiu0306
- [fx] recovered skipped pipeline tests (#1338) by Frank Lee
- [fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
- [fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
- [fx] add balanced policy v2 (#1251) by YuliangLiu0306
- [fx] Add unit test and fix bugs for transform_mlp_pass (#1299) by XYE
- [fx] added apex normalization to patched modules (#1300) by Frank Lee
Recommendation System
- [FAW] export FAW in _ops (#1438) by Jiarui Fang
- [FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
- [FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
- [FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang
Global Tensor
- [tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
- [tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306
Hotfix
- [hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
- [hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
- [hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
- [hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
- [hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
- [hotfix] fix zero optim save/load state dict (#1381) by ver217
- [hotfix] fix zero ddp buffer cast (#1376) by ver217
- [hotfix] fix no optimizer in save/load (#1363) by HELSON
- [hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
- [hotfix] ZeroDDP use new process group (#1333) by ver217
- [hotfix] shared model returns cpu state_dict (#1328) by ver217
- [hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
- [hotfix] fix unit test test_module_spec (#1321) by HELSON
- [hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
- [hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
- [hotfix] add missing file (#1308) by Jiarui Fang
- [hotfix] remove potiential circle import (#1307) by Jiarui Fang
- [hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
- [hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
- [hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang
Communication
- [communication] add p2p_v2.py to support communication with List[Any] (#1407) by Kirigaya Kazuto
Device
- [device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306
Chunk
- [chunk] add PG check for tensor appending (#1383) by Jiarui Fang
DDP
- [DDP] test ddp state dict uses more strict threshold (#1382) by ver217
Checkpoint
- [checkpoint] add kwargs for load_state_dict (#1374) by HELSON
- [checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
- [checkpoint] sharded optim save/load grad scaler (#1350) by ver217
- [checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
- [checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
- [checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang
Util
- [util] standard checkpoint function naming (#1377) by Frank Lee
Nvme
- [nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217
Colotensor
- [colotensor] use cpu memory to store state_dict (#1367) by HELSON
- [colotensor] add Tensor.view op and its unit test (#1343) by HELSON
Unit test
- [unit test] add megatron init test in zero_optim (#1358) by HELSON
Docker
- [docker] add tensornvme in docker (#1354) by ver217
Doc
- [doc] update rst and docstring (#1351) by ver217
Refactor
- [refactor] refactor ColoTensor's unit tests (#1340) by HELSON
Workflow
- [workflow] update docker build workflow to use proxy (#1334) by Frank Lee
- [workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
- [workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
- [workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
- [workflow] updated release bdist workflow (#1318) by Frank Lee
- [workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
- [workflow] updated pytorch compatibility test (#1311) by Frank Lee
Test
- [test] removed outdated unit test for meta context (#1329) by Frank Lee
Utils
- [utils] integrated colotensor with lazy init context (#1324) by Frank Lee
Optimizer
- [Optimizer] Remove useless ColoOptimizer (#1312) by Jiarui Fang
- [Optimizer] polish the init method of ColoOptimizer (#1310) by Jiarui Fang
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.9...v0.1.8