v0.3.0

版本发布时间: 2024-04-03 09:55:44

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

Highlight

Refactor attention and optimize GQA(#1258 #1307 #1116), achieving 22+ and 16+ RPS for internlm2-7b and internlm2-20b, about 1.8x faster than vLLM
Support new models, including Qwen1.5-MOE(#1372), DBRX(#1367), DeepSeek-VL(#1335)

rm unused var by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1256
Expose cache_block_seq_len to API by @ispobock in https://github.com/InternLM/lmdeploy/pull/1218
add chat template for deepseek coder model by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1310
Add more log info for api_server by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1323
remove cuda cache after loading vison model by @irexyc in https://github.com/InternLM/lmdeploy/pull/1325
Add new chat cli with auto backend feature by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1276
Update rewritings for qwen by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1351
lazy import accelerate.init_empty_weights for vl async engine by @irexyc in https://github.com/InternLM/lmdeploy/pull/1359
update lmdeploy pypi packages deps to cuda12 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1368
update max_prefill_token_num for low gpu memory by @grimoire in https://github.com/InternLM/lmdeploy/pull/1373
Optimize pipeline of pytorch engine by @grimoire in https://github.com/InternLM/lmdeploy/pull/1328

fix different stop/bad words length in batch by @irexyc in https://github.com/InternLM/lmdeploy/pull/1246
Fix performance issue of chatbot by @ispobock in https://github.com/InternLM/lmdeploy/pull/1295
add missed argument by @irexyc in https://github.com/InternLM/lmdeploy/pull/1317
Fix dlpack memory leak by @ispobock in https://github.com/InternLM/lmdeploy/pull/1344
Fix invalid context for Internstudio platform by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1354
fix benchmark generation by @grimoire in https://github.com/InternLM/lmdeploy/pull/1349
fix window attention by @grimoire in https://github.com/InternLM/lmdeploy/pull/1341
fix batchApplyRepetitionPenalty by @irexyc in https://github.com/InternLM/lmdeploy/pull/1358
Fix memory leak of DLManagedTensor by @ispobock in https://github.com/InternLM/lmdeploy/pull/1361
fix vlm inference hung with tp by @irexyc in https://github.com/InternLM/lmdeploy/pull/1336
[Fix] fix the unit test of model name deduce by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1382

Add restful interface regrssion daily test workflow. by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1302
Add offline mode for testcase workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1318
workflow bugfix and add llava-v1.5-13b testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1339
Add benchmark test workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1364
bump version to v0.3.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1387

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.2.6...v0.3.0