v0.1.9

hpcaitech/ColossalAI

版本发布时间: 2022-08-11 21:16:46

hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)

What's Changed

Zero

[zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
[zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
[zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
[zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
[zero] add AgChunk (#1417) by HELSON
[zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
[zero] chunk manager allows filtering ex-large params (#1393) by ver217
[zero] zero optim state_dict takes only_rank_0 (#1384) by ver217

Fx

[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
[fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
[fx] patched torch.max and data movement operator (#1391) by Frank Lee
[fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
[fx] patched torch.full for huggingface opt (#1386) by Frank Lee
[fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
[fx] add torchaudio test (#1369) by Super Daniel
[fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
[fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
[fx] added activation checkpoint codegen (#1355) by Frank Lee
[fx] fixed apex normalization patch exception (#1352) by Frank Lee
[fx] added activation checkpointing annotation (#1349) by Frank Lee
[fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
[fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
[fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
[fx]refactor tracer (#1335) by YuliangLiu0306
[fx] recovered skipped pipeline tests (#1338) by Frank Lee
[fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
[fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
[fx] add balanced policy v2 (#1251) by YuliangLiu0306
[fx] Add unit test and fix bugs for transform_mlp_pass (#1299) by XYE
[fx] added apex normalization to patched modules (#1300) by Frank Lee

Recommendation System

[FAW] export FAW in _ops (#1438) by Jiarui Fang
[FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
[FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
[FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang

Global Tensor

[tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
[tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306

Hotfix

[hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
[hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
[hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
[hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
[hotfix] fix zero optim save/load state dict (#1381) by ver217
[hotfix] fix zero ddp buffer cast (#1376) by ver217
[hotfix] fix no optimizer in save/load (#1363) by HELSON
[hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
[hotfix] ZeroDDP use new process group (#1333) by ver217
[hotfix] shared model returns cpu state_dict (#1328) by ver217
[hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
[hotfix] fix unit test test_module_spec (#1321) by HELSON
[hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
[hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
[hotfix] add missing file (#1308) by Jiarui Fang
[hotfix] remove potiential circle import (#1307) by Jiarui Fang
[hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
[hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
[hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

[communication] add p2p_v2.py to support communication with List[Any] (#1407) by Kirigaya Kazuto

Device

[device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306

Chunk

[chunk] add PG check for tensor appending (#1383) by Jiarui Fang

DDP

[DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

[checkpoint] add kwargs for load_state_dict (#1374) by HELSON
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
[checkpoint] sharded optim save/load grad scaler (#1350) by ver217
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
[checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
[checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

[util] standard checkpoint function naming (#1377) by Frank Lee

Nvme

[nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

[colotensor] use cpu memory to store state_dict (#1367) by HELSON
[colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

[unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

[docker] add tensornvme in docker (#1354) by ver217

Doc

[doc] update rst and docstring (#1351) by ver217

Refactor

[refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

[workflow] update docker build workflow to use proxy (#1334) by Frank Lee
[workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
[workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
[workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
[workflow] updated release bdist workflow (#1318) by Frank Lee
[workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
[workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

[test] removed outdated unit test for meta context (#1329) by Frank Lee

Utils

[utils] integrated colotensor with lazy init context (#1324) by Frank Lee

Optimizer

[Optimizer] Remove useless ColoOptimizer (#1312) by Jiarui Fang
[Optimizer] polish the init method of ColoOptimizer (#1310) by Jiarui Fang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.9...v0.1.8

相关地址：原始地址下载(tar) 下载(zip)

查看：2022-08-11发行的版本