v0.1.11rc1

hpcaitech/ColossalAI

版本发布时间: 2022-10-19 11:49:55

hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)

What's Changed

Hotfix

[hotfix] resharding cost issue (#1742) by YuliangLiu0306
[hotfix] solver bug caused by dict type comm cost (#1686) by YuliangLiu0306
[hotfix] fix wrong type name in profiler (#1678) by Boyuan Yao
[hotfix]unit test (#1670) by YuliangLiu0306
[hotfix] add recompile after graph manipulatation (#1621) by YuliangLiu0306
[hotfix] got sliced types (#1614) by YuliangLiu0306

Release

[release] update to v0.1.11 (#1736) by Frank Lee

Doc

[doc] update recommendation system catalogue (#1732) by binmakeswell
[doc] update recommedation system urls (#1725) by Jiarui Fang

Zero

[zero] add chunk init function for users (#1729) by HELSON
[zero] add constant placement policy (#1705) by HELSON

Pre-commit

[pre-commit] update pre-commit (#1726) by HELSON

Autoparallel

[autoparallel] runtime_backward_apply (#1720) by YuliangLiu0306
[autoparallel] moved tests to test_tensor_shard (#1713) by Frank Lee
[autoparallel] resnet block runtime apply (#1709) by YuliangLiu0306
[autoparallel] fixed broken node handler tests (#1708) by Frank Lee
[autoparallel] refactored the autoparallel module for organization (#1706) by Frank Lee
[autoparallel] adapt runtime passes (#1703) by YuliangLiu0306
[autoparallel] collated all deprecated files (#1700) by Frank Lee
[autoparallel] init new folder structure (#1696) by Frank Lee
[autoparallel] adapt solver and CostGraph with new handler (#1695) by YuliangLiu0306
[autoparallel] add output handler and placeholder handler (#1694) by YuliangLiu0306
[autoparallel] add pooling handler (#1690) by YuliangLiu0306
[autoparallel] where_handler_v2 (#1688) by YuliangLiu0306
[autoparallel] fix C version rotor inconsistency (#1691) by Boyuan Yao
[autoparallel] added sharding spec conversion for linear handler (#1687) by Frank Lee
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) by YuliangLiu0306
[autoparallel] add unary element wise handler v2 (#1674) by YuliangLiu0306
[autoparallel] add following node generator (#1673) by YuliangLiu0306
[autoparallel] add layer norm handler v2 (#1671) by YuliangLiu0306
[autoparallel] fix insecure subprocess (#1680) by Boyuan Yao
[autoparallel] add rotor C version (#1658) by Boyuan Yao
[autoparallel] added utils for broadcast operation (#1665) by Frank Lee
[autoparallel] update CommSpec (#1667) by YuliangLiu0306
[autoparallel] added bias comm spec to matmul strategy (#1664) by Frank Lee
[autoparallel] add batch norm handler v2 (#1666) by YuliangLiu0306
[autoparallel] remove no strategy nodes (#1652) by YuliangLiu0306
[autoparallel] added compute resharding costs for node handler (#1662) by Frank Lee
[autoparallel] added new strategy constructor template (#1661) by Frank Lee
[autoparallel] added node handler for bmm (#1655) by Frank Lee
[autoparallel] add conv handler v2 (#1663) by YuliangLiu0306
[autoparallel] adapt solver with gpt (#1653) by YuliangLiu0306
[autoparallel] implemented all matmul strategy generator (#1650) by Frank Lee
[autoparallel] change the following nodes strategies generation logic (#1636) by YuliangLiu0306
[autoparallel] where handler (#1651) by YuliangLiu0306
[autoparallel] implemented linear projection strategy generator (#1639) by Frank Lee
[autoparallel] adapt solver with mlp (#1638) by YuliangLiu0306
[autoparallel] Add pofo sequence annotation (#1637) by Boyuan Yao
[autoparallel] add elementwise handler (#1622) by YuliangLiu0306
[autoparallel] add embedding handler (#1620) by YuliangLiu0306
[autoparallel] protect bcast handler from invalid strategies (#1631) by YuliangLiu0306
[autoparallel] add layernorm handler (#1629) by YuliangLiu0306
[autoparallel] recover the merged node strategy index (#1613) by YuliangLiu0306
[autoparallel] added new linear module handler (#1616) by Frank Lee
[autoparallel] added new node handler (#1612) by Frank Lee
[autoparallel]add bcast matmul strategies (#1605) by YuliangLiu0306
[autoparallel] refactored the data structure for sharding strategy (#1610) by Frank Lee
[autoparallel] add bcast op handler (#1600) by YuliangLiu0306
[autoparallel] added all non-bcast matmul strategies (#1603) by Frank Lee
[autoparallel] added strategy generator and bmm strategies (#1602) by Frank Lee
[autoparallel] add reshape handler (#1594) by YuliangLiu0306
[autoparallel] refactored shape consistency to remove redundancy (#1591) by Frank Lee
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) by YuliangLiu0306
[autoparallel] added generate_sharding_spec to utils (#1590) by Frank Lee
[autoparallel] added solver option dataclass (#1588) by Frank Lee
[autoparallel] adapt solver with resnet (#1583) by YuliangLiu0306

Fx/meta/rpc

[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel

Embeddings

[embeddings] add doc in readme (#1711) by Jiarui Fang
[embeddings] more detailed timer (#1692) by Jiarui Fang
[embeddings] cache option (#1635) by Jiarui Fang
[embeddings] use cache_ratio instead of cuda_row_num (#1611) by Jiarui Fang
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) by CsRic

Unittest

[unittest] added doc for the pytest wrapper (#1704) by Frank Lee
[unittest] supported condititonal testing based on env var (#1701) by Frank Lee

Embedding

[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) by Jiarui Fang
[embedding] polish async copy (#1657) by Jiarui Fang
[embedding] add more detail profiling (#1656) by Jiarui Fang
[embedding] print profiling results (#1654) by Jiarui Fang
[embedding] non-blocking cpu-gpu copy (#1647) by Jiarui Fang
[embedding] isolate cache_op from forward (#1645) by CsRic
[embedding] rollback for better FAW performance (#1625) by Jiarui Fang
[embedding] updates some default parameters by Jiarui Fang

Fx/profiler

[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
[fx/profiler] provide a table of summary. (#1634) by Super Daniel
[fx/profiler] tuned the calculation of memory estimation (#1619) by Super Daniel

Pipeline/fix-bug

[pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684) by Kirigaya Kazuto

Pipeline/rank_recorder

[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) by Kirigaya Kazuto

Feature

[feature] A new ZeRO implementation (#1644) by HELSON
Revert "[feature] new zero implementation (#1623)" (#1643) by Jiarui Fang
[feature] new zero implementation (#1623) by HELSON

Fx

[fx] Add concrete info prop (#1677) by Boyuan Yao
[fx] refactor code for profiler / enable fake tensor movement. (#1646) by Super Daniel
[fx] fix offload codegen test (#1648) by Boyuan Yao
[fx] Modify offload codegen (#1618) by Boyuan Yao
[fx] PoC of runtime shape consistency application (#1607) by YuliangLiu0306
[fx] Add pofo solver (#1608) by Boyuan Yao
[fx] Add offload codegen (#1598) by Boyuan Yao
[fx] provide an accurate estimation of memory. (#1587) by Super Daniel
[fx] Improve linearize and rotor solver (#1586) by Boyuan Yao
[fx] Add nested checkpoint in activation checkpoint codegen (#1585) by Boyuan Yao

Pipeline/pytree

[pipeline/pytree] add pytree to process args and kwargs | provide data_process_func to process args and kwargs after forward (#1642) by Kirigaya Kazuto

Fix

[fix] fixed the collective pattern name for consistency (#1649) by Frank Lee

Moe

[moe] initialize MoE groups by ProcessGroup (#1640) by HELSON
[moe] fix moe bugs (#1633) by HELSON
[moe] fix MoE bugs (#1628) by HELSON

Tensor

[tensor] use communication autograd func (#1617) by YuliangLiu0306

Pipeline/chimera

[pipeline/chimera] test chimera | fix bug of initializing (#1615) by Kirigaya Kazuto
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) by Kirigaya Kazuto

Workflow

[workflow] deactivate conda environment before removing (#1606) by Frank Lee

Fx/tuning

[fx/tuning] tune performance on rotor with meta info. (#1599) by Super Daniel

Hotfix/rotor

[hotfix/rotor] fix variable names (#1597) by Super Daniel

Nfc

[NFC] add OPT serving (#1581) by binmakeswell
[NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) by Boyuan Yao
[NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554) by Fazzie-Maqianli
[NFC] polish utils/tensor_detector/init.py code style (#1573) by CsRic
[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) by Sze-qq
[NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571) by superhao1995
[NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570) by Jiatong Han
[NFC] polish colossalai/pipeline/utils.py code style (#1562) by Zirui Zhu
[NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563) by Xue Fuzhao
[NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) by Zangwei Zheng
[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) by DouJS
[NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566) by LuGY
[NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) by BigOneLiXiaoMing
[NFC] polish colossalai/builder/init.py code style (#1560) by Ziheng Qin
[NFC] polish colossalai/testing/comparison.py code style. (#1558) by Super Daniel
[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) by Ofey Chan
[NFC] polish code colossalai/gemini/update/search_utils.py (#1557) by Kai Wang (Victor Kai)
[NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) by yuxuan-lou
[NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) by shenggan
[NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) by Maruyama_Aya
[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style by binmakeswell
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559) by Kirigaya Kazuto

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.1.11rc1...v0.1.10

相关地址：原始地址下载(tar) 下载(zip)

查看：2022-10-19发行的版本