v0.0.6

版本发布时间: 2023-08-25 21:30:09

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

Highlights

Profiling tool for huggingface and deepspeed models by @wangruohui in https://github.com/InternLM/lmdeploy/pull/161
Support windows platform by @irexyc in https://github.com/InternLM/lmdeploy/pull/209
Qwen-7B, dynamic NTK scaling and logN scaling support in turbomind by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/230
Add Restful API by @AllentDan in https://github.com/InternLM/lmdeploy/pull/223
Support context decoding with DP in pytorch by @wangruohui in https://github.com/InternLM/lmdeploy/pull/193

Adjust dependency of gradio server by @AllentDan in https://github.com/InternLM/lmdeploy/pull/236
Implement movmatrix using warp shuffling for CUDA < 11.8 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/267
Add 'accelerate' to requirement list by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/261
Fix building with CUDA 11.3 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/280
Pad tok_embedding and output weights to make their shape divisible by TP by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/285
Fix llama2 70b & qwen quantization error by @pppppM in https://github.com/InternLM/lmdeploy/pull/273
Import turbomind in gradio server only when it is needed by @AllentDan in https://github.com/InternLM/lmdeploy/pull/303

Update workflow for building docker image by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/282
Change to github-hosted runner for building docker image by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/291

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.0.5...v0.0.6