v0.3.8
版本发布时间: 2024-05-31 19:41:19
hpcaitech/ColossalAI最新发布版本:v0.4.4(2024-09-19 10:53:35)
What's Changed
Release
- [release] update version (#5752) by Hongxin Liu
Fix/example
- [Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao
Gemini
- Merge pull request #5749 from hpcaitech/prefetch by botbw
- Merge pull request #5754 from Hz188/prefetch by botbw
- [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
- [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
- Merge pull request #5733 from Hz188/feature/prefetch by botbw
- Merge pull request #5731 from botbw/prefetch by botbw
- [gemini] init auto policy prefetch by hxwang
- Merge pull request #5722 from botbw/prefetch by botbw
- [gemini] maxprefetch means maximum work to keep by hxwang
- [gemini] use compute_chunk to find next chunk by hxwang
- [gemini] prefetch chunks by hxwang
- [gemini]remove registered gradients hooks (#5696) by flybird11111
Chore
- [chore] refactor profiler utils by hxwang
- [chore] remove unnecessary assert since compute list might not be recorded by hxwang
- [chore] remove unnecessary test & changes by hxwang
- Merge pull request #5738 from botbw/prefetch by Haze188
- [chore] fix init error by hxwang
- [chore] Update placement_policy.py by botbw
- [chore] remove debugging info by hxwang
- [chore] remove print by hxwang
- [chore] refactor & sync by hxwang
- [chore] sync by hxwang
Bug
- [bug] continue fix by hxwang
- [bug] workaround for idx fix by hxwang
- [bug] fix early return (#5740) by botbw
Bugs
- [bugs] fix args.profile=False DummyProfiler errro by genghaozhe
Inference
- [inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
- [Inference]Fix readme and example for API server (#5742) by Jianghai
- [inference] release (#5747) by binmakeswell
- [Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
- [Inference] Fix API server, test and example (#5712) by Jianghai
- [Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
- [Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
- [Inference] Add example test_ci script by CjhHa1
- [Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
- [Inference] resolve rebase conflicts by CjhHa1
- [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
- [Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
- [Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
- [Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
- [Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
- [Inference] Fix quant bits order (#5681) by 傅剑寒
- [inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
- [Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo
Feature
- [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
- [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
- Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
- [Feature] qlora support (#5586) by linsj20
Example
- [example] add profile util for llama by hxwang
- [example] Update Inference Example (#5725) by Yuanheng Zhao
Colossal-inference
- [Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao
Nfc
- [NFC] fix requirements (#5744) by Yuanheng Zhao
- [NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao
Ci
- [ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
- [ci] Fix example tests (#5714) by Yuanheng Zhao
Sync
- Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
- [sync] Sync feature/colossal-infer with main by Yuanheng Zhao
- [Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
- [sync] resolve conflicts of merging main by Yuanheng Zhao
Shardformer
- [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
- [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
- [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
- Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
- [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
- [shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Doc
- [doc] Update Inference Readme (#5736) by Yuanheng Zhao
Fix/inference
- [Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao
Lazy
- [lazy] fix lazy cls init (#5720) by flybird11111
Misc
- [misc] Update PyTorch version in docs (#5724) by binmakeswell
- [misc] Update PyTorch version in docs (#5711) by Edenzzzz
- [misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
- [misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu
Colossal-llama
- [Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li
Fix
- [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
- [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
- [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao
Feat
- [Feat]Inference RPC Server Support (#5705) by Runyu Lu
Hotfix
- [hotfix] fix inference typo (#5438) by hugo-syn
- [hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
- [hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) by Yuanheng Zhao
Inference/feat
- [Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) by 傅剑寒
- [Inference/Feat] Add quant kvcache interface (#5700) by 傅剑寒
- [Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686) by 傅剑寒
- [Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) by 傅剑寒
- [Inference/Feat] Feat quant kvcache step2 (#5674) by 傅剑寒
Online server
- [Online Server] Chat Api for streaming and not streaming response (#5470) by Jianghai
Zero
- [zero]remove registered gradients hooks (#5687) by flybird11111
Kernel
- [kernel] Support New KCache Layout - Triton Kernel (#5677) by Yuanheng Zhao
Inference/kernel
- [Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) by Steve Luo
Lowlevelzero
- [LowLevelZero] low level zero support lora (#5153) by flybird11111
Lora
- [lora] add lora APIs for booster, support lora for TorchDDP (#4981) by Baizhou Zhang
Devops
- [devops] fix release docker ci (#5665) by Hongxin Liu
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.8...v0.3.7