MyGit

v0.4.0

InternLM/lmdeploy

版本发布时间: 2024-04-23 19:18:37

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

Highlights

Support for Llama3 and additional Vision-Language Models (VLMs):

Introduce online int4/int8 KV quantization and inference

The following table shows the evaluation results of three LLM models with different KV numerical precision:

- - - llama2-7b-chat - - internlm2-chat-7b - - qwen1.5-7b-chat - -
dataset version metric kv fp16 kv int8 kv int4 kv fp16 kv int8 kv int4 fp16 kv int8 kv int4
ceval - naive_average 28.42 27.96 27.58 60.45 60.88 60.28 70.56 70.49 68.62
mmlu - naive_average 35.64 35.58 34.79 63.91 64 62.36 61.48 61.56 60.65
triviaqa 2121ce score 56.09 56.13 53.71 58.73 58.7 58.18 44.62 44.77 44.04
gsm8k 1d7fe4 accuracy 28.2 28.05 27.37 70.13 69.75 66.87 54.97 56.41 54.74
race-middle 9a54b6 accuracy 41.57 41.78 41.23 88.93 88.93 88.93 87.33 87.26 86.28
race-high 9a54b6 accuracy 39.65 39.77 40.77 85.33 85.31 84.62 82.53 82.59 82.02

The below table presents LMDeploy's inference performance with quantized KV.

model kv type test settings RPS v.s. kv fp16
llama2-chat-7b fp16 tp1 / ratio 0.8 / bs 256 / prompts 10000 14.98 1.0
- int8 tp1 / ratio 0.8 / bs 256 / prompts 10000 19.01 1.27
- int4 tp1 / ratio 0.8 / bs 256 / prompts 10000 20.81 1.39
llama2-chat-13b fp16 tp1 / ratio 0.9 / bs 128 / prompts 10000 8.55 1.0
- int8 tp1 / ratio 0.9 / bs 256 / prompts 10000 10.96 1.28
- int4 tp1 / ratio 0.9 / bs 256 / prompts 10000 11.91 1.39
internlm2-chat-7b fp16 tp1 / ratio 0.8 / bs 256 / prompts 10000 24.13 1.0
- int8 tp1 / ratio 0.8 / bs 256 / prompts 10000 25.28 1.05
- int4 tp1 / ratio 0.8 / bs 256 / prompts 10000 25.80 1.07

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.3.0...v0.4.0

相关地址:原始地址 下载(tar) 下载(zip)

1、 lmdeploy-0.4.0+cu118-cp310-cp310-manylinux2014_x86_64.whl 72.08MB

2、 lmdeploy-0.4.0+cu118-cp310-cp310-win_amd64.whl 50.26MB

3、 lmdeploy-0.4.0+cu118-cp311-cp311-manylinux2014_x86_64.whl 72.09MB

4、 lmdeploy-0.4.0+cu118-cp311-cp311-win_amd64.whl 50.26MB

5、 lmdeploy-0.4.0+cu118-cp38-cp38-manylinux2014_x86_64.whl 72.09MB

6、 lmdeploy-0.4.0+cu118-cp38-cp38-win_amd64.whl 50.26MB

7、 lmdeploy-0.4.0+cu118-cp39-cp39-manylinux2014_x86_64.whl 72.08MB

8、 lmdeploy-0.4.0+cu118-cp39-cp39-win_amd64.whl 50.25MB

查看:2024-04-23发行的版本