v0.3.0
版本发布时间: 2024-04-03 09:55:44
InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)
Highlight
- Refactor attention and optimize GQA(#1258 #1307 #1116), achieving 22+ and 16+ RPS for internlm2-7b and internlm2-20b, about 1.8x faster than vLLM
- Support new models, including Qwen1.5-MOE(#1372), DBRX(#1367), DeepSeek-VL(#1335)
What's Changed
🚀 Features
- Add tensor core GQA dispatch for
[4,5,6,8]
by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1258 - upgrade turbomind to v2.1 by by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1307, https://github.com/InternLM/lmdeploy/pull/1116
- Support slora to pipeline by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1286
- Support qwen for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1265
- Support Triton inference server python backend by @ispobock in https://github.com/InternLM/lmdeploy/pull/1329
- torch engine support dbrx by @grimoire in https://github.com/InternLM/lmdeploy/pull/1367
- Support qwen2 moe for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1372
- Add deepseek vl by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1335
💥 Improvements
- rm unused var by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1256
- Expose cache_block_seq_len to API by @ispobock in https://github.com/InternLM/lmdeploy/pull/1218
- add chat template for deepseek coder model by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1310
- Add more log info for api_server by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1323
- remove cuda cache after loading vison model by @irexyc in https://github.com/InternLM/lmdeploy/pull/1325
- Add new chat cli with auto backend feature by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1276
- Update rewritings for qwen by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1351
- lazy import accelerate.init_empty_weights for vl async engine by @irexyc in https://github.com/InternLM/lmdeploy/pull/1359
- update lmdeploy pypi packages deps to cuda12 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1368
- update
max_prefill_token_num
for low gpu memory by @grimoire in https://github.com/InternLM/lmdeploy/pull/1373 - Optimize pipeline of pytorch engine by @grimoire in https://github.com/InternLM/lmdeploy/pull/1328
🐞 Bug fixes
- fix different stop/bad words length in batch by @irexyc in https://github.com/InternLM/lmdeploy/pull/1246
- Fix performance issue of chatbot by @ispobock in https://github.com/InternLM/lmdeploy/pull/1295
- add missed argument by @irexyc in https://github.com/InternLM/lmdeploy/pull/1317
- Fix dlpack memory leak by @ispobock in https://github.com/InternLM/lmdeploy/pull/1344
- Fix invalid context for Internstudio platform by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1354
- fix benchmark generation by @grimoire in https://github.com/InternLM/lmdeploy/pull/1349
- fix window attention by @grimoire in https://github.com/InternLM/lmdeploy/pull/1341
- fix batchApplyRepetitionPenalty by @irexyc in https://github.com/InternLM/lmdeploy/pull/1358
- Fix memory leak of DLManagedTensor by @ispobock in https://github.com/InternLM/lmdeploy/pull/1361
- fix vlm inference hung with tp by @irexyc in https://github.com/InternLM/lmdeploy/pull/1336
- [Fix] fix the unit test of model name deduce by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1382
📚 Documentations
- add citation in readme by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1308
- Add slora example for pipeline by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1343
🌐 Other
- Add restful interface regrssion daily test workflow. by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1302
- Add offline mode for testcase workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1318
- workflow bugfix and add llava-v1.5-13b testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1339
- Add benchmark test workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1364
- bump version to v0.3.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1387
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.2.6...v0.3.0
1、 lmdeploy-0.3.0+cu118-cp310-cp310-manylinux2014_x86_64.whl 73.81MB
2、 lmdeploy-0.3.0+cu118-cp310-cp310-win_amd64.whl 52.75MB
3、 lmdeploy-0.3.0+cu118-cp311-cp311-manylinux2014_x86_64.whl 73.82MB
4、 lmdeploy-0.3.0+cu118-cp311-cp311-win_amd64.whl 52.75MB
5、 lmdeploy-0.3.0+cu118-cp38-cp38-manylinux2014_x86_64.whl 73.82MB
6、 lmdeploy-0.3.0+cu118-cp38-cp38-win_amd64.whl 52.75MB
7、 lmdeploy-0.3.0+cu118-cp39-cp39-manylinux2014_x86_64.whl 73.81MB
8、 lmdeploy-0.3.0+cu118-cp39-cp39-win_amd64.whl 52.74MB