v0.2.6

vllm-project/vllm

版本发布时间: 2023-12-18 02:35:42

vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)

Major changes

Fast model execution with CUDA/HIP graph
W4A16 GPTQ support (thanks to @chu-tianxiang)
Fix memory profiling with tensor parallelism
Fix *.bin weight loading for Mixtral models

What's Changed

Fix typing in generate function for AsyncLLMEngine & add toml to requirements-dev by @mezuzza in https://github.com/vllm-project/vllm/pull/2100
Fix Dockerfile.rocm by @tjtanaa in https://github.com/vllm-project/vllm/pull/2101
avoid multiple redefinition by @MitchellX in https://github.com/vllm-project/vllm/pull/1817
Add a flag to include stop string in output text by @yunfeng-scale in https://github.com/vllm-project/vllm/pull/1976
Add GPTQ support by @chu-tianxiang in https://github.com/vllm-project/vllm/pull/916
[Docs] Add quantization support to docs by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2135
[ROCm] Temporarily remove GPTQ ROCm support by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2138
simplify loading weights logic by @esmeetu in https://github.com/vllm-project/vllm/pull/2133
Optimize model execution with CUDA graph by @WoosukKwon in https://github.com/vllm-project/vllm/pull/1926
[Minor] Delete Llama tokenizer warnings by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2146
Fix all-reduce memory usage by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2151
Pin PyTorch & xformers versions by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2155
Remove dependency on CuPy by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2152
[Docs] Add CUDA graph support to docs by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2148
Temporarily enforce eager mode for GPTQ models by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2154
[Minor] Add more detailed explanation on quantization argument by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2145
[Minor] Fix xformers version by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2158
[Minor] Add Phi 2 to supported models by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2159
Make sampler less blocking by @Yard1 in https://github.com/vllm-project/vllm/pull/1889
[Minor] Fix a typo in .pt weight support by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2160
Disable CUDA graph for SqueezeLLM by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2161
Bump up to v0.2.6 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2157

New Contributors

@mezuzza made their first contribution in https://github.com/vllm-project/vllm/pull/2100
@MitchellX made their first contribution in https://github.com/vllm-project/vllm/pull/1817

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.2.5...v0.2.6

相关地址：原始地址下载(tar) 下载(zip)

1、 vllm-0.2.6+cu118-cp310-cp310-manylinux1_x86_64.whl 9.71MB

2、 vllm-0.2.6+cu118-cp311-cp311-manylinux1_x86_64.whl 9.72MB

3、 vllm-0.2.6+cu118-cp38-cp38-manylinux1_x86_64.whl 9.71MB

4、 vllm-0.2.6+cu118-cp39-cp39-manylinux1_x86_64.whl 9.71MB

5、 vllm-0.2.6-cp310-cp310-manylinux1_x86_64.whl 9.72MB

6、 vllm-0.2.6-cp311-cp311-manylinux1_x86_64.whl 9.74MB

7、 vllm-0.2.6-cp38-cp38-manylinux1_x86_64.whl 9.73MB

8、 vllm-0.2.6-cp39-cp39-manylinux1_x86_64.whl 9.73MB

查看：2023-12-18发行的版本