v0.6.0a0

版本发布时间: 2024-08-26 17:12:19

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

Highlight

Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
Optimize the prefilling inference stage of PyTorchEngine
Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate

Before:

lmdeploy serve api_server /the/path/of/your/awesome/model \
    --model-name customized_chat_template.json

After

lmdeploy serve api_server  /the/path/of/your/awesome/model \
    --model-name "the served model name"
    --chat-template customized_chat_template.json

What's Changed

🚀 Features

support vlm custom image process parameters in openai input format by @irexyc in https://github.com/InternLM/lmdeploy/pull/2245
New GEMM kernels for weight-only quantization by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2090
Fix hidden size and support mistral nemo by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2215
Support custom logits processors by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2329
support openbmb/MiniCPM-V-2_6 by @irexyc in https://github.com/InternLM/lmdeploy/pull/2351
Support phi3.5 for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2361

💥 Improvements

Remove deprecated arguments from API and clarify model_name and chat_template_name by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1931
Fix duplicated session_id when pipeline is used by multithreads by @irexyc in https://github.com/InternLM/lmdeploy/pull/2134
remove eviction param by @grimoire in https://github.com/InternLM/lmdeploy/pull/2285
Remove QoS serving by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2294
Support send tool_calls back to internlm2 by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2147
Add stream options to control usage by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2313
add device type for pytorch engine in cli by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2321
Update error status_code to raise error in openai client by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2333
Change to use device instead of device-type in cli by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2337
Add GEMM test utils by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2342
Add environment variable to control SILU fusion by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2343
Use single thread per model instance by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2339
add cache to speed up docker building by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2344
add max_prefill_token_num argument in CLI by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2345
torch engine optimize prefill for long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/1962
Refactor turbomind (1/N) by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2352
feat(server): enable seed parameter for openai compatible server. by @DearPlanet in https://github.com/InternLM/lmdeploy/pull/2353

🐞 Bug fixes

enable run vlm with pytorch engine in gradio by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2256
fix side-effect: failed to update tm model config with tm engine config by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2275
Fix internvl2 template and update docs by @irexyc in https://github.com/InternLM/lmdeploy/pull/2292
fix the issue missing dependencies in the Dockerfile and pip by @ColorfulDick in https://github.com/InternLM/lmdeploy/pull/2240
Fix the way to get "quantization_config" from model's coniguration by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2325
fix(ascend): fix import error of pt engine in cli by @CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2328
Default rope_scaling_factor of TurbomindEngineConfig to None by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2358
Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2362

📚 Documentations

Reorganize the user guide and update the get_started section by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2038
cancel support baichuan2 7b awq in pytorch engine by @grimoire in https://github.com/InternLM/lmdeploy/pull/2246
Add user guide about slora serving by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2084

🌐 Other

test prtest image update by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2192
Update python support version by @wuhongsheng in https://github.com/InternLM/lmdeploy/pull/2290
fix Windows compile error by @zhyncs in https://github.com/InternLM/lmdeploy/pull/2303
fix: follow up #2303 by @zhyncs in https://github.com/InternLM/lmdeploy/pull/2307
[ci] benchmark react by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2183
bump version to v0.6.0a0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2371

New Contributors

@wuhongsheng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2290
@ColorfulDick made their first contribution in https://github.com/InternLM/lmdeploy/pull/2240
@DearPlanet made their first contribution in https://github.com/InternLM/lmdeploy/pull/2353

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.5.3...v0.6.0a0

相关地址：原始地址下载(tar) 下载(zip)

1、 lmdeploy-0.6.0a0+cu118-cp310-cp310-manylinux2014_x86_64.whl 83.2MB

2、 lmdeploy-0.6.0a0+cu118-cp310-cp310-win_amd64.whl 36.51MB

3、 lmdeploy-0.6.0a0+cu118-cp311-cp311-manylinux2014_x86_64.whl 83.22MB

4、 lmdeploy-0.6.0a0+cu118-cp311-cp311-win_amd64.whl 36.52MB

5、 lmdeploy-0.6.0a0+cu118-cp312-cp312-manylinux2014_x86_64.whl 83.23MB

6、 lmdeploy-0.6.0a0+cu118-cp312-cp312-win_amd64.whl 36.52MB

7、 lmdeploy-0.6.0a0+cu118-cp38-cp38-manylinux2014_x86_64.whl 83.21MB

8、 lmdeploy-0.6.0a0+cu118-cp38-cp38-win_amd64.whl 36.52MB

9、 lmdeploy-0.6.0a0+cu118-cp39-cp39-manylinux2014_x86_64.whl 83.19MB

10、 lmdeploy-0.6.0a0+cu118-cp39-cp39-win_amd64.whl 36.51MB

查看：2024-08-26发行的版本