v0.4.2

版本发布时间: 2024-05-27 16:56:15

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

Highlight

Support 4-bit weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2

Quantization

lmdeploy lite auto_awq OpenGVLab/InternVL-Chat-V1-5 --work-dir ./InternVL-Chat-V1-5-AWQ

Inference with quantized model

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('./InternVL-Chat-V1-5-AWQ', backend_config=TurbomindEngineConfig(tp=1, model_format='awq'))

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
out = pipe(('describe this image', img))
print(out)

Balance vision model when deploying VLMs with multiple GPUs

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL-Chat-V1-5', backend_config=TurbomindEngineConfig(tp=2))

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
out = pipe(('describe this image', img))
print(out)

What's Changed

🚀 Features

PyTorch Engine hash table based prefix caching by @grimoire in https://github.com/InternLM/lmdeploy/pull/1429
support phi3 by @grimoire in https://github.com/InternLM/lmdeploy/pull/1497
Turbomind prefix caching by @ispobock in https://github.com/InternLM/lmdeploy/pull/1450
Enable search scale for awq by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1545
[Feature] Support vl models quantization by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1553

💥 Improvements

make Qwen compatible with Slora when TP > 1 by @jjjjohnson in https://github.com/InternLM/lmdeploy/pull/1518
Optimize slora by @grimoire in https://github.com/InternLM/lmdeploy/pull/1447
Use a faster format for images in VLMs by @isidentical in https://github.com/InternLM/lmdeploy/pull/1575
add chat-template args to chat cli by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1566
Get the max session len from config.json by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1550
Optimize w8a8 kernel by @grimoire in https://github.com/InternLM/lmdeploy/pull/1353
support python 3.12 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1605
Optimize moe by @grimoire in https://github.com/InternLM/lmdeploy/pull/1520
Balance vision model weights on multi gpus by @irexyc in https://github.com/InternLM/lmdeploy/pull/1591
Support user-specified IMAGE_TOKEN position for deepseek-vl model by @irexyc in https://github.com/InternLM/lmdeploy/pull/1627
Optimize GQA/MQA by @grimoire in https://github.com/InternLM/lmdeploy/pull/1649

🐞 Bug fixes

fix logger init by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1598
Bugfix: wrongly assign gen_config with True by @thelongestusernameofall in https://github.com/InternLM/lmdeploy/pull/1594
Enable split-kv for attention by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1606
Fix xcomposer2 vision model process by @irexyc in https://github.com/InternLM/lmdeploy/pull/1640
Fix NTK scaling by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1636
Fix illegal memory access when seq_len < 64 by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1616
Fix llava vl template by @irexyc in https://github.com/InternLM/lmdeploy/pull/1620
[side-effect] fix deepseek-vl when tp is 1 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1648
fix logprobs output by @irexyc in https://github.com/InternLM/lmdeploy/pull/1561
fix fused-moe in triton2.2.0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/1654
Align tokenizers in pipeline and api_server benchmark scripts by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1650
[side-effect] fix UnboundLocalError for internlm-xcomposer2-4khd-7b by @irexyc in https://github.com/InternLM/lmdeploy/pull/1661
remove paged attention prefill autotune by @grimoire in https://github.com/InternLM/lmdeploy/pull/1658
Fix transformers 4.41.0 prompt may differ after encode decode by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1617

📚 Documentations

Fix typo in w8a8.md by @chg0901 in https://github.com/InternLM/lmdeploy/pull/1568
Update doc for prefix caching by @ispobock in https://github.com/InternLM/lmdeploy/pull/1597
Update VL document by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1657

🌐 Other

remove first empty token check and add input validation testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1549
add more model into benchmark and evaluate workflow by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1565
add vl awq testcase and refactor pipeline testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1630
bump version to v0.4.2 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1644

New Contributors

@isidentical made their first contribution in https://github.com/InternLM/lmdeploy/pull/1575
@chg0901 made their first contribution in https://github.com/InternLM/lmdeploy/pull/1568
@thelongestusernameofall made their first contribution in https://github.com/InternLM/lmdeploy/pull/1594

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.4.1...v0.4.2

相关地址：原始地址下载(tar) 下载(zip)

1、 lmdeploy-0.4.2+cu118-cp310-cp310-manylinux2014_x86_64.whl 70.59MB

2、 lmdeploy-0.4.2+cu118-cp310-cp310-win_amd64.whl 48.61MB

3、 lmdeploy-0.4.2+cu118-cp311-cp311-manylinux2014_x86_64.whl 70.61MB

4、 lmdeploy-0.4.2+cu118-cp311-cp311-win_amd64.whl 48.61MB

5、 lmdeploy-0.4.2+cu118-cp312-cp312-manylinux2014_x86_64.whl 70.62MB

6、 lmdeploy-0.4.2+cu118-cp312-cp312-win_amd64.whl 48.61MB

7、 lmdeploy-0.4.2+cu118-cp38-cp38-manylinux2014_x86_64.whl 70.61MB

8、 lmdeploy-0.4.2+cu118-cp38-cp38-win_amd64.whl 48.61MB

9、 lmdeploy-0.4.2+cu118-cp39-cp39-manylinux2014_x86_64.whl 70.59MB

10、 lmdeploy-0.4.2+cu118-cp39-cp39-win_amd64.whl 48.6MB

查看：2024-05-27发行的版本