v0.6.0a0
版本发布时间: 2024-08-26 17:12:19
InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)
Highlight
- Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
- Optimize the prefilling inference stage of PyTorchEngine
- Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate
Before:
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name customized_chat_template.json
After
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name "the served model name"
--chat-template customized_chat_template.json
What's Changed
🚀 Features
- support vlm custom image process parameters in openai input format by @irexyc in https://github.com/InternLM/lmdeploy/pull/2245
- New GEMM kernels for weight-only quantization by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2090
- Fix hidden size and support mistral nemo by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2215
- Support custom logits processors by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2329
- support openbmb/MiniCPM-V-2_6 by @irexyc in https://github.com/InternLM/lmdeploy/pull/2351
- Support phi3.5 for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2361
💥 Improvements
- Remove deprecated arguments from API and clarify model_name and chat_template_name by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1931
- Fix duplicated session_id when pipeline is used by multithreads by @irexyc in https://github.com/InternLM/lmdeploy/pull/2134
- remove eviction param by @grimoire in https://github.com/InternLM/lmdeploy/pull/2285
- Remove QoS serving by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2294
- Support send tool_calls back to internlm2 by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2147
- Add stream options to control usage by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2313
- add device type for pytorch engine in cli by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2321
- Update error status_code to raise error in openai client by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2333
- Change to use device instead of device-type in cli by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2337
- Add GEMM test utils by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2342
- Add environment variable to control SILU fusion by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2343
- Use single thread per model instance by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2339
- add cache to speed up docker building by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2344
- add max_prefill_token_num argument in CLI by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2345
- torch engine optimize prefill for long context by @grimoire in https://github.com/InternLM/lmdeploy/pull/1962
- Refactor turbomind (1/N) by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/2352
- feat(server): enable
seed
parameter for openai compatible server. by @DearPlanet in https://github.com/InternLM/lmdeploy/pull/2353
🐞 Bug fixes
- enable run vlm with pytorch engine in gradio by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/2256
- fix side-effect: failed to update tm model config with tm engine config by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2275
- Fix internvl2 template and update docs by @irexyc in https://github.com/InternLM/lmdeploy/pull/2292
- fix the issue missing dependencies in the Dockerfile and pip by @ColorfulDick in https://github.com/InternLM/lmdeploy/pull/2240
- Fix the way to get "quantization_config" from model's coniguration by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2325
- fix(ascend): fix import error of pt engine in cli by @CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2328
- Default rope_scaling_factor of TurbomindEngineConfig to None by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2358
- Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2362
📚 Documentations
- Reorganize the user guide and update the get_started section by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2038
- cancel support baichuan2 7b awq in pytorch engine by @grimoire in https://github.com/InternLM/lmdeploy/pull/2246
- Add user guide about slora serving by @AllentDan in https://github.com/InternLM/lmdeploy/pull/2084
🌐 Other
- test prtest image update by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2192
- Update python support version by @wuhongsheng in https://github.com/InternLM/lmdeploy/pull/2290
- fix Windows compile error by @zhyncs in https://github.com/InternLM/lmdeploy/pull/2303
- fix: follow up #2303 by @zhyncs in https://github.com/InternLM/lmdeploy/pull/2307
- [ci] benchmark react by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2183
- bump version to v0.6.0a0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/2371
New Contributors
- @wuhongsheng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2290
- @ColorfulDick made their first contribution in https://github.com/InternLM/lmdeploy/pull/2240
- @DearPlanet made their first contribution in https://github.com/InternLM/lmdeploy/pull/2353
Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.5.3...v0.6.0a0
1、 lmdeploy-0.6.0a0+cu118-cp310-cp310-manylinux2014_x86_64.whl 83.2MB
2、 lmdeploy-0.6.0a0+cu118-cp310-cp310-win_amd64.whl 36.51MB
3、 lmdeploy-0.6.0a0+cu118-cp311-cp311-manylinux2014_x86_64.whl 83.22MB
4、 lmdeploy-0.6.0a0+cu118-cp311-cp311-win_amd64.whl 36.52MB
5、 lmdeploy-0.6.0a0+cu118-cp312-cp312-manylinux2014_x86_64.whl 83.23MB
6、 lmdeploy-0.6.0a0+cu118-cp312-cp312-win_amd64.whl 36.52MB
7、 lmdeploy-0.6.0a0+cu118-cp38-cp38-manylinux2014_x86_64.whl 83.21MB
8、 lmdeploy-0.6.0a0+cu118-cp38-cp38-win_amd64.whl 36.52MB
9、 lmdeploy-0.6.0a0+cu118-cp39-cp39-manylinux2014_x86_64.whl 83.19MB
10、 lmdeploy-0.6.0a0+cu118-cp39-cp39-win_amd64.whl 36.51MB