v0.5.0

InternLM/lmdeploy

版本发布时间: 2024-07-01 15:22:00

InternLM/lmdeploy最新发布版本:v0.6.0a0(2024-08-26 17:12:19)

What's Changed

🚀 Features

support MiniCPM-Llama3-V 2.5 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1708
[Feature]: Support llava for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1641
Device dispatcher by @grimoire in https://github.com/InternLM/lmdeploy/pull/1775
Add GLM-4-9B-Chat by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1724
Torch deepseek v2 by @grimoire in https://github.com/InternLM/lmdeploy/pull/1621
Support internvl-chat for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1797
Add interfaces to the pipeline to obtain logits and ppl by @irexyc in https://github.com/InternLM/lmdeploy/pull/1652
[Feature]: Support cogvlm-chat by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1502

💥 Improvements

support mistral and llava_mistral in turbomind by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1579
Add health endpoint by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1679
upgrade the version of the dependency package peft by @grimoire in https://github.com/InternLM/lmdeploy/pull/1687
Follow the conventional model_name by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1677
API Image URL fetch timeout by @vody-am in https://github.com/InternLM/lmdeploy/pull/1684
Support internlm-xcomposer2-4khd-7b awq by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1666
update dockerfile and docs by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1715
lazy import VLAsyncEngine to avoid bringing in VLMs dependencies when deploying LLMs by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1714
feat: align with OpenAI temperature range by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1733
feat: align with OpenAI temperature range in api server by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1734
Refactor converter about get_input_model_registered_name and get_output_model_registered_name_and_config by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1702
Refine max_new_tokens logic to improve user experience by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1705
Refactor loading weights by @grimoire in https://github.com/InternLM/lmdeploy/pull/1603
refactor config by @grimoire in https://github.com/InternLM/lmdeploy/pull/1751
Add anomaly handler by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1780
Encode raw image file to base64 by @irexyc in https://github.com/InternLM/lmdeploy/pull/1773
skip inference for oversized inputs by @grimoire in https://github.com/InternLM/lmdeploy/pull/1769
fix: prevent numpy breakage by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1791
More accurate time logging for ImageEncoder and fix concurrent image processing corruption by @irexyc in https://github.com/InternLM/lmdeploy/pull/1765
Optimize kernel launch for triton2.2.0 and triton2.3.0 by @grimoire in https://github.com/InternLM/lmdeploy/pull/1499
feat: auto set awq model_format from hf by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1799
check driver mismatch by @grimoire in https://github.com/InternLM/lmdeploy/pull/1811
PyTorchEngine adapts to the latest internlm2 modeling. by @grimoire in https://github.com/InternLM/lmdeploy/pull/1798
AsyncEngine create cancel task in exception. by @grimoire in https://github.com/InternLM/lmdeploy/pull/1807
compat internlm2 for pytorch engine by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1825
Add model revision & download_dir to cli by @irexyc in https://github.com/InternLM/lmdeploy/pull/1814
fix image encoder request queue by @irexyc in https://github.com/InternLM/lmdeploy/pull/1837
Harden stream callback by @lzhangzz in https://github.com/InternLM/lmdeploy/pull/1838
Support Qwen2-1.5b awq by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1793
remove chat template config in turbomind engine by @irexyc in https://github.com/InternLM/lmdeploy/pull/1161
misc: align PyTorch Engine temprature with TurboMind by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1850
docs: update cache-max-entry-count help message by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1892

🐞 Bug fixes

fix typos by @irexyc in https://github.com/InternLM/lmdeploy/pull/1690
[Bugfix] fix internvl-1.5-chat vision model preprocess and freeze weights by @DefTruth in https://github.com/InternLM/lmdeploy/pull/1741
lock setuptools version in dockerfile by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1770
Fix openai package can not use proxy stream mode by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1692
Fix finish_reason by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1768
fix uncached stop words by @grimoire in https://github.com/InternLM/lmdeploy/pull/1754
[side-effect]Fix param --cache-max-entry-count is not taking effect (#1758) by @QwertyJack in https://github.com/InternLM/lmdeploy/pull/1778
support qwen2 1.5b by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1782
fix falcon attention by @grimoire in https://github.com/InternLM/lmdeploy/pull/1761
Refine AsyncEngine exception handler by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1789
[side-effect] fix weight_type caused by PR #1702 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1795
fix best_match_model by @irexyc in https://github.com/InternLM/lmdeploy/pull/1812
Fix Request completed log by @irexyc in https://github.com/InternLM/lmdeploy/pull/1821
fix qwen-vl-chat hung by @irexyc in https://github.com/InternLM/lmdeploy/pull/1824
Detokenize with prompt token ids by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1753
Update engine.py to fix small typos by @WANGSSSSSSS in https://github.com/InternLM/lmdeploy/pull/1829
[side-effect] bring back "--cap" argument in chat cli by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1859
Fix vl session-len by @AllentDan in https://github.com/InternLM/lmdeploy/pull/1860
fix gradio vl "stop_words" by @irexyc in https://github.com/InternLM/lmdeploy/pull/1873
fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2 by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1886
fix model name matching for internvl by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1867

📚 Documentations

docs: add BentoLMDeploy in README by @zhyncs in https://github.com/InternLM/lmdeploy/pull/1736
[Doc]: Update docs for internlm2.5 by @RunningLeon in https://github.com/InternLM/lmdeploy/pull/1887

🌐 Other

add longtext generation benchmark by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1694
add qwen2 model into testcase by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1772
fix pr test for newest internlm2 model by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1806
react test evaluation config by @zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1861
bump version to v0.5.0 by @lvhan028 in https://github.com/InternLM/lmdeploy/pull/1852

New Contributors

@DefTruth made their first contribution in https://github.com/InternLM/lmdeploy/pull/1741
@QwertyJack made their first contribution in https://github.com/InternLM/lmdeploy/pull/1778
@WANGSSSSSSS made their first contribution in https://github.com/InternLM/lmdeploy/pull/1829

Full Changelog: https://github.com/InternLM/lmdeploy/compare/v0.4.2...v0.5.0

相关地址：原始地址下载(tar) 下载(zip)

1、 lmdeploy-0.5.0+cu118-cp310-cp310-manylinux2014_x86_64.whl 71.37MB

2、 lmdeploy-0.5.0+cu118-cp310-cp310-win_amd64.whl 48.84MB

3、 lmdeploy-0.5.0+cu118-cp311-cp311-manylinux2014_x86_64.whl 71.39MB

4、 lmdeploy-0.5.0+cu118-cp311-cp311-win_amd64.whl 48.85MB

5、 lmdeploy-0.5.0+cu118-cp312-cp312-manylinux2014_x86_64.whl 71.39MB

6、 lmdeploy-0.5.0+cu118-cp312-cp312-win_amd64.whl 48.85MB

7、 lmdeploy-0.5.0+cu118-cp38-cp38-manylinux2014_x86_64.whl 71.38MB

8、 lmdeploy-0.5.0+cu118-cp38-cp38-win_amd64.whl 48.84MB

9、 lmdeploy-0.5.0+cu118-cp39-cp39-manylinux2014_x86_64.whl 71.37MB

10、 lmdeploy-0.5.0+cu118-cp39-cp39-win_amd64.whl 48.85MB

查看：2024-07-01发行的版本