v0.3.5
版本发布时间: 2024-02-23 16:46:07
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
What's Changed
Release
- [release] update version (#5380) by Hongxin Liu
Llama
- Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
- [llama] fix memory issue (#5371) by Hongxin Liu
- [llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
- [llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
- [llama] add flash attn patch for npu (#5362) by Hongxin Liu
- [llama] update training script (#5360) by Hongxin Liu
- [llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu
Moe
- [moe] fix tests by ver217
- [moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
- [moe] fix mixtral forward default value (#5329) by Hongxin Liu
- [moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
- [moe] support mixtral (#5309) by Hongxin Liu
- [moe] update capacity computing (#5253) by Hongxin Liu
- [moe] init mixtral impl by Xuanlei Zhao
- [moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
- [moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
- [moe] merge moe into main (#4978) by Xuanlei Zhao
Lr-scheduler
- [lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu
Eval
- [eval] update llama npu eval (#5366) by Camille Zhong
Gemini
- [gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
- [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
- [gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
- [gemini] gemini support extra-dp (#5043) by flybird11111
- [gemini] gemini support tensor parallelism. (#4942) by flybird11111
Fix
- [fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen
Checkpointio
- [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu
Chat
- [Chat] fix sft loss nan (#5345) by YeAnbang
Extension
- [extension] fixed exception catch (#5342) by Frank Lee
Doc
- [doc] added docs for extensions (#5324) by Frank Lee
- [doc] add llama2-13B disyplay (#5285) by Desperado-Jia
- [doc] fix doc typo (#5256) by binmakeswell
- [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
- [doc] SwiftInfer release (#5236) by binmakeswell
- [doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
- [doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
- [doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
- [doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
- [doc] update pytorch version in documents. (#5177) by flybird11111
- [doc] fix colossalqa document (#5146) by Michelle
- [doc] updated paper citation (#5131) by Frank Lee
- [doc] add moe news (#5128) by binmakeswell
Tests
- [tests] fix t5 test. (#5322) by flybird11111
Accelerator
- Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
- [accelerator] fixed npu api by FrankLeeeee
- [accelerator] init the accelerator module (#5129) by Frank Lee
Workflow
- [workflow] updated CI image (#5318) by Frank Lee
- [workflow] fixed oom tests (#5275) by Frank Lee
- [workflow] fixed incomplete bash command (#5272) by Frank Lee
- [workflow] fixed build CI (#5240) by Frank Lee
Feat
- [feat] refactored extension module (#5298) by Frank Lee
Nfc
- [NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
- [nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
- [nfc] fix typo change directoty to directory (#5111) by digger yu
- [nfc] fix typo and author name (#5089) by digger yu
- [nfc] fix typo in docs/ (#4972) by digger yu
Hotfix
- [hotfix] fix 3d plugin test (#5292) by Hongxin Liu
- [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
- [hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
- [hotfix] removed unused flag (#5242) by Frank Lee
- [hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
- [Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
- [hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
- [hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
- [hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
- [hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang
Sync
- Merge pull request #5278 from ver217/sync/npu by Frank Lee
Shardformer
- [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
- [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
- [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
- [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
- [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
- [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao
Ci
- [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
- [ci] fix shardformer tests. (#5255) by flybird11111
- [ci] fixed ddp test (#5254) by Frank Lee
- [ci] fixed booster test (#5251) by Frank Lee
Npu
- [npu] change device to accelerator api (#5239) by Hongxin Liu
- [npu] use extension for op builder (#5172) by Xuanlei Zhao
- [npu] support triangle attention for llama (#5130) by Xuanlei Zhao
- [npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
- [npu] add npu support for gemini and zero (#5067) by Hongxin Liu
Pipeline
- [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
- [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
- [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
- [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen
Format
- [format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]
Colossal-llama-2
- [Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
- [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen
Devops
- [devops] update torch versoin in ci (#5217) by Hongxin Liu
Colossaleval
- [ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen
Colossalqa
- [colossalqa] fix pangu api (#5170) by Michelle
- [ColossalQA] refactor server and webui & add new feature (#5138) by Michelle
Plugin
- [plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111
Feature
- [FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
- [Feature] Add document retrieval QA (#5020) by YeAnbang
Inference
- [inference] refactor examples and fix schedule (#5077) by Hongxin Liu
- [inference] update examples and engine (#5073) by Xu Kai
- [inference] Refactor inference architecture (#5057) by Xu Kai
- [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai
Hotfix/hybridengine
- [hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
- [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia
Misc
- [misc] remove outdated submodule (#5070) by Hongxin Liu
- [misc] add code owners (#5024) by Hongxin Liu
Kernels
- [Kernels]added flash-decoidng of triton (#5063) by Cuiqing Li (李崔卿)
- [Kernels]Update triton kernels into 2.1.0 (#5046) by Cuiqing Li (李崔卿)
Exampe
- [exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111
Pipeline,shardformer
- [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when
strict=False
, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4