v0.4.1

vllm-project/vllm

版本发布时间: 2024-04-24 10:28:08

vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)

Highlights

Features

Support and enhance CommandR+ (#3829), minicpm (#3893), Meta Llama 3 (#4175, #4182), Mixtral 8x22b (#4073, #4002)
Support private model registration, and updating our support policy (#3871, 3948)
Support PyTorch 2.2.1 and Triton 2.2.0 (#4061, #4079, #3805, #3904, #4271)
Add option for using LM Format Enforcer for guided decoding (#3868)
Add option for optionally initialize tokenizer and detokenizer (#3748)
Add option for load model using tensorizer (#3476)

Enhancements

vLLM is now mostly type checked by mypy (#3816, #4006, #4161, #4043)
Progress towards chunked prefill scheduler (#3550, #3853, #4280, #3884)
Progress towards speculative decoding (#3250, #3706, #3894)
Initial support with dynamic per-tensor scaling via FP8 (#4118)

Hardwares

Intel CPU inference backend is added (#3993, #3634)
AMD backend is enhanced with Triton kernel and e4m3fn KV cache (#3643, #3290)

What's Changed

[Kernel] Layernorm performance optimization by @mawong-amd in https://github.com/vllm-project/vllm/pull/3662
[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version by @youkaichao in https://github.com/vllm-project/vllm/pull/3746
[CI/Build] Make Marlin Tests Green by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3753
[Misc] Minor fixes in requirements.txt by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3769
[Misc] Some minor simplifications to detokenization logic by @njhill in https://github.com/vllm-project/vllm/pull/3670
[Misc] Fix Benchmark TTFT Calculation for Chat Completions by @ywang96 in https://github.com/vllm-project/vllm/pull/3768
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding by @cadedaniel in https://github.com/vllm-project/vllm/pull/3250
[Misc] Add support for new autogptq checkpoint_format by @Qubitium in https://github.com/vllm-project/vllm/pull/3689
[Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup by @cadedaniel in https://github.com/vllm-project/vllm/pull/3783
[Hardware][Intel] Add CPU inference backend by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3634
[HotFix] [CI/Build] Minor fix for CPU backend CI by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3787
[Frontend][Bugfix] allow using the default middleware with a root path by @A-Mahla in https://github.com/vllm-project/vllm/pull/3788
[Doc] Fix vLLMEngine Doc Page by @ywang96 in https://github.com/vllm-project/vllm/pull/3791
[CI/Build] fix TORCH_CUDA_ARCH_LIST in wheel build by @youkaichao in https://github.com/vllm-project/vllm/pull/3801
Fix crash when try torch.cuda.set_device in worker by @leiwen83 in https://github.com/vllm-project/vllm/pull/3770
[Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ by @mgoin in https://github.com/vllm-project/vllm/pull/3798
[CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary by @youkaichao in https://github.com/vllm-project/vllm/pull/3803
[Speculative decoding] Adding configuration object for speculative decoding by @cadedaniel in https://github.com/vllm-project/vllm/pull/3706
[BugFix] Use different mechanism to get vllm version in is_cpu() by @njhill in https://github.com/vllm-project/vllm/pull/3804
[Doc] Update README.md by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3806
[Doc] Update contribution guidelines for better onboarding by @michaelfeil in https://github.com/vllm-project/vllm/pull/3819
[3/N] Refactor scheduler for chunked prefill scheduling by @rkooo567 in https://github.com/vllm-project/vllm/pull/3550
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) by @AdrianAbeyta in https://github.com/vllm-project/vllm/pull/3290
[Misc] Publish 3rd meetup slides by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3835
Fixes the argument for local_tokenizer_group by @sighingnow in https://github.com/vllm-project/vllm/pull/3754
[Core] Enable hf_transfer by default if available by @michaelfeil in https://github.com/vllm-project/vllm/pull/3817
[Bugfix] Add kv_scale input parameter to CPU backend by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3840
[Core] [Frontend] Make detokenization optional by @mgerstgrasser in https://github.com/vllm-project/vllm/pull/3749
[Bugfix] Fix args in benchmark_serving by @CatherineSue in https://github.com/vllm-project/vllm/pull/3836
[Benchmark] Refactor sample_requests in benchmark_throughput by @gty111 in https://github.com/vllm-project/vllm/pull/3613
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 by @youkaichao in https://github.com/vllm-project/vllm/pull/3805
[Hardware][CPU] Update cpu torch to match default of 2.2.1 by @mgoin in https://github.com/vllm-project/vllm/pull/3854
[Model] Cohere CommandR+ by @saurabhdash2512 in https://github.com/vllm-project/vllm/pull/3829
[Core] improve robustness of pynccl by @youkaichao in https://github.com/vllm-project/vllm/pull/3860
[Doc]Add asynchronous engine arguments to documentation. by @SeanGallen in https://github.com/vllm-project/vllm/pull/3810
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels by @youkaichao in https://github.com/vllm-project/vllm/pull/3859
[Misc] Add pytest marker to opt-out of global test cleanup by @cadedaniel in https://github.com/vllm-project/vllm/pull/3863
[Misc] Fix linter issues in examples/fp8/quantizer/quantize.py by @cadedaniel in https://github.com/vllm-project/vllm/pull/3864
[Bugfix] Fixing requirements.txt by @noamgat in https://github.com/vllm-project/vllm/pull/3865
[Misc] Define common requirements by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3841
Add option to completion API to truncate prompt tokens by @tdoublep in https://github.com/vllm-project/vllm/pull/3144
[Chunked Prefill][4/n] Chunked prefill scheduler. by @rkooo567 in https://github.com/vllm-project/vllm/pull/3853
[Bugfix] Fix incorrect output on OLMo models in Tensor Parallelism by @Isotr0py in https://github.com/vllm-project/vllm/pull/3869
[CI/Benchmark] add more iteration and use multiple percentiles for robust latency benchmark by @youkaichao in https://github.com/vllm-project/vllm/pull/3889
[Core] enable out-of-tree model register by @youkaichao in https://github.com/vllm-project/vllm/pull/3871
[WIP][Core] latency optimization by @youkaichao in https://github.com/vllm-project/vllm/pull/3890
[Bugfix] Fix Llava inference with Tensor Parallelism. by @Isotr0py in https://github.com/vllm-project/vllm/pull/3883
[Model] add minicpm by @SUDA-HLT-ywfang in https://github.com/vllm-project/vllm/pull/3893
[Bugfix] Added Command-R GPTQ support by @egortolmachev in https://github.com/vllm-project/vllm/pull/3849
[Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration by @Ki6an in https://github.com/vllm-project/vllm/pull/3767
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations by @mawong-amd in https://github.com/vllm-project/vllm/pull/3782
[BugFix][Model] Fix commandr RoPE max_position_embeddings by @esmeetu in https://github.com/vllm-project/vllm/pull/3919
[Core] separate distributed_init from worker by @youkaichao in https://github.com/vllm-project/vllm/pull/3904
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" by @cadedaniel in https://github.com/vllm-project/vllm/pull/3837
[Bugfix] Fix KeyError on loading GPT-NeoX by @jsato8094 in https://github.com/vllm-project/vllm/pull/3925
[ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm by @jpvillam-amd in https://github.com/vllm-project/vllm/pull/3643
[Misc] Avoid loading incorrect LoRA config by @jeejeelee in https://github.com/vllm-project/vllm/pull/3777
[Benchmark] Add cpu options to bench scripts by @PZD-CHINA in https://github.com/vllm-project/vllm/pull/3915
[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable by @zhaotyer in https://github.com/vllm-project/vllm/pull/3955
[Bugfix] Fix logits processor when prompt_logprobs is not None by @huyiwen in https://github.com/vllm-project/vllm/pull/3899
[Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3876
[Bugfix][ROCm] Add numba to Dockerfile.rocm by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3962
[Model][AMD] ROCm support for 256 head dims for Gemma by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/3972
[Doc] Add doc to state our model support policy by @youkaichao in https://github.com/vllm-project/vllm/pull/3948
[Bugfix] Remove key sorting for guided_json parameter in OpenAi compatible Server by @dmarasco in https://github.com/vllm-project/vllm/pull/3945
[Doc] Fix getting stared to use publicly available model by @fpaupier in https://github.com/vllm-project/vllm/pull/3963
[Bugfix] handle hf_config with architectures == None by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3982
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators by @youkaichao in https://github.com/vllm-project/vllm/pull/3950
[Core][5/N] Fully working chunked prefill e2e by @rkooo567 in https://github.com/vllm-project/vllm/pull/3884
[Core][Model] Use torch.compile to accelerate layernorm in commandr by @youkaichao in https://github.com/vllm-project/vllm/pull/3985
[Test] Add xformer and flash attn tests by @rkooo567 in https://github.com/vllm-project/vllm/pull/3961
[Misc] refactor ops and cache_ops layer by @jikunshang in https://github.com/vllm-project/vllm/pull/3913
[Doc][Installation] delete python setup.py develop by @youkaichao in https://github.com/vllm-project/vllm/pull/3989
[Kernel] Fused MoE Config for Mixtral 8x22 by @ywang96 in https://github.com/vllm-project/vllm/pull/4002
fix-bgmv-kernel-640 by @kingljl in https://github.com/vllm-project/vllm/pull/4007
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3824
[Core] Set linear_weights directly on the layer by @Yard1 in https://github.com/vllm-project/vllm/pull/3977
[Core][Distributed] make init_distributed_environment compatible with init_process_group by @youkaichao in https://github.com/vllm-project/vllm/pull/4014
Fix echo/logprob OpenAI completion bug by @dylanwhawk in https://github.com/vllm-project/vllm/pull/3441
[Kernel] Add extra punica sizes to support bigger vocabs by @Yard1 in https://github.com/vllm-project/vllm/pull/4015
[BugFix] Fix handling of stop strings and stop token ids by @njhill in https://github.com/vllm-project/vllm/pull/3672
[Doc] Add typing hints / mypy types cleanup by @michaelfeil in https://github.com/vllm-project/vllm/pull/3816
[Core] Support LoRA on quantized models by @jeejeelee in https://github.com/vllm-project/vllm/pull/4012
[Frontend][Core] Move merge_async_iterators to utils by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/4026
[Test] Test multiple attn backend for chunked prefill. by @rkooo567 in https://github.com/vllm-project/vllm/pull/4023
[Bugfix] fix type hint for py 3.8 by @youkaichao in https://github.com/vllm-project/vllm/pull/4036
[Misc] Fix typo in scheduler.py by @zhuohan123 in https://github.com/vllm-project/vllm/pull/4022
[mypy] Add mypy type annotation part 1 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4006
[Core] fix custom allreduce default value by @youkaichao in https://github.com/vllm-project/vllm/pull/4040
Fix triton compilation issue by @Bellk17 in https://github.com/vllm-project/vllm/pull/3984
[Bugfix] Fix LoRA bug by @jeejeelee in https://github.com/vllm-project/vllm/pull/4032
[CI/Test] expand ruff and yapf for all supported python version by @youkaichao in https://github.com/vllm-project/vllm/pull/4037
[Bugfix] More type hint fixes for py 3.8 by @dylanwhawk in https://github.com/vllm-project/vllm/pull/4039
[Core][Distributed] improve logging for init dist by @youkaichao in https://github.com/vllm-project/vllm/pull/4042
[Bugfix] fix_log_time_in_metrics by @zspo in https://github.com/vllm-project/vllm/pull/4050
[Bugfix] fix_small_bug_in_neuron_executor by @zspo in https://github.com/vllm-project/vllm/pull/4051
[Kernel] Add punica dimension for Baichuan-13B by @jeejeelee in https://github.com/vllm-project/vllm/pull/4053
[Frontend] [Core] feat: Add model loading using tensorizer by @sangstar in https://github.com/vllm-project/vllm/pull/3476
[Core] avoid too many cuda context by caching p2p test by @youkaichao in https://github.com/vllm-project/vllm/pull/4021
[BugFix] Fix tensorizer extra in setup.py by @njhill in https://github.com/vllm-project/vllm/pull/4072
[Docs] document that mixtral 8x22b is supported by @simon-mo in https://github.com/vllm-project/vllm/pull/4073
[Misc] Upgrade triton to 2.2.0 by @esmeetu in https://github.com/vllm-project/vllm/pull/4061
[Bugfix] Fix filelock version requirement by @zhuohan123 in https://github.com/vllm-project/vllm/pull/4075
[Misc][Minor] Fix CPU block num log in CPUExecutor. by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/4088
[Core] Simplifications to executor classes by @njhill in https://github.com/vllm-project/vllm/pull/4071
[Doc] Add better clarity for tensorizer usage by @sangstar in https://github.com/vllm-project/vllm/pull/4090
[Bugfix] Fix ray workers profiling with nsight by @rickyyx in https://github.com/vllm-project/vllm/pull/4095
[Typing] Fix Sequence type GenericAlias only available after Python 3.9. by @rkooo567 in https://github.com/vllm-project/vllm/pull/4092
[Core] Fix engine-use-ray broken by @rkooo567 in https://github.com/vllm-project/vllm/pull/4105
LM Format Enforcer Guided Decoding Support by @noamgat in https://github.com/vllm-project/vllm/pull/3868
[Core] Refactor model loading code by @Yard1 in https://github.com/vllm-project/vllm/pull/4097
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine by @cadedaniel in https://github.com/vllm-project/vllm/pull/3894
[Misc] [CI] Fix CI failure caught after merge by @cadedaniel in https://github.com/vllm-project/vllm/pull/4126
[CI] Move CPU/AMD tests to after wait by @cadedaniel in https://github.com/vllm-project/vllm/pull/4123
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication by @youkaichao in https://github.com/vllm-project/vllm/pull/4024
[Bugfix] fix output parsing error for trtllm backend by @elinx in https://github.com/vllm-project/vllm/pull/4137
[Kernel] Add punica dimension for Swallow-MS-7B LoRA by @ucciicci in https://github.com/vllm-project/vllm/pull/4134
[Typing] Mypy typing part 2 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4043
[Core] Add integrity check during initialization; add test for it by @youkaichao in https://github.com/vllm-project/vllm/pull/4155
Allow model to be served under multiple names by @hmellor in https://github.com/vllm-project/vllm/pull/2894
[Bugfix] Get available quantization methods from quantization registry by @mgoin in https://github.com/vllm-project/vllm/pull/4098
[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill by @mmoskal in https://github.com/vllm-project/vllm/pull/4128
[Docs] document that Meta Llama 3 is supported by @simon-mo in https://github.com/vllm-project/vllm/pull/4175
[Bugfix] Support logprobs when using guided_json and other constrained decoding fields by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/4149
[Misc] Bump transformers to latest version by @njhill in https://github.com/vllm-project/vllm/pull/4176
[CI/CD] add neuron docker and ci test scripts by @liangfu in https://github.com/vllm-project/vllm/pull/3571
[Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) by @agt in https://github.com/vllm-project/vllm/pull/4159
[Core] add an option to log every function call to for debugging hang/crash in distributed inference by @youkaichao in https://github.com/vllm-project/vllm/pull/4079
Support eos_token_id from generation_config.json by @simon-mo in https://github.com/vllm-project/vllm/pull/4182
[Bugfix] Fix LoRA loading check by @jeejeelee in https://github.com/vllm-project/vllm/pull/4138
Bump version of 0.4.1 by @simon-mo in https://github.com/vllm-project/vllm/pull/4177
[Misc] fix docstrings by @UranusSeven in https://github.com/vllm-project/vllm/pull/4191
[Bugfix][Core] Restore logging of stats in the async engine by @ronensc in https://github.com/vllm-project/vllm/pull/4150
[Misc] add nccl in collect env by @youkaichao in https://github.com/vllm-project/vllm/pull/4211
Pass tokenizer_revision when getting tokenizer in openai serving by @chiragjn in https://github.com/vllm-project/vllm/pull/4214
[Bugfix] Add fix for JSON whitespace by @ayusher in https://github.com/vllm-project/vllm/pull/4189
Fix missing docs and out of sync EngineArgs by @hmellor in https://github.com/vllm-project/vllm/pull/4219
[Kernel][FP8] Initial support with dynamic per-tensor scaling by @comaniac in https://github.com/vllm-project/vllm/pull/4118
[Frontend] multiple sampling params support by @nunjunj in https://github.com/vllm-project/vllm/pull/3570
Updating lm-format-enforcer version and adding links to decoding libraries in docs by @noamgat in https://github.com/vllm-project/vllm/pull/4222
Don't show default value for flags in EngineArgs by @hmellor in https://github.com/vllm-project/vllm/pull/4223
[Doc]: Update the page of adding new models by @YeFD in https://github.com/vllm-project/vllm/pull/4236
Make initialization of tokenizer and detokenizer optional by @GeauxEric in https://github.com/vllm-project/vllm/pull/3748
[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring by @hongxiayang in https://github.com/vllm-project/vllm/pull/4129
[Core][Distributed] fix _is_full_nvlink detection by @youkaichao in https://github.com/vllm-project/vllm/pull/4233
[Misc] Add vision language model support to CPU backend by @Isotr0py in https://github.com/vllm-project/vllm/pull/3968
[Bugfix] Fix type annotations in CPU model runner by @WoosukKwon in https://github.com/vllm-project/vllm/pull/4256
[Frontend] Enable support for CPU backend in AsyncLLMEngine. by @sighingnow in https://github.com/vllm-project/vllm/pull/3993
[Bugfix] Ensure download_weights_from_hf(..) inside loader is using the revision parameter by @alexm-nm in https://github.com/vllm-project/vllm/pull/4217
Add example scripts to documentation by @hmellor in https://github.com/vllm-project/vllm/pull/4225
[Core] Scheduler perf fix by @rkooo567 in https://github.com/vllm-project/vllm/pull/4270
[Doc] Update the SkyPilot doc with serving and Llama-3 by @Michaelvll in https://github.com/vllm-project/vllm/pull/4276
[Core][Distributed] use absolute path for library file by @youkaichao in https://github.com/vllm-project/vllm/pull/4271
Fix autodoc directives by @hmellor in https://github.com/vllm-project/vllm/pull/4272
[Mypy] Part 3 fix typing for nested directories for most of directory by @rkooo567 in https://github.com/vllm-project/vllm/pull/4161
[Core] Some simplification of WorkerWrapper changes by @njhill in https://github.com/vllm-project/vllm/pull/4183
[Core] Scheduling optimization 2 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4280
[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. by @cadedaniel in https://github.com/vllm-project/vllm/pull/3951
[Bugfix] Fixing max token error message for openai compatible server by @jgordley in https://github.com/vllm-project/vllm/pull/4016
[Bugfix] Add init_cached_hf_modules to RayWorkerWrapper by @DefTruth in https://github.com/vllm-project/vllm/pull/4286
[Core][Logging] Add last frame information for better debugging by @youkaichao in https://github.com/vllm-project/vllm/pull/4278
[CI] Add ccache for wheel builds job by @simon-mo in https://github.com/vllm-project/vllm/pull/4281
AQLM CUDA support by @jaemzfleming in https://github.com/vllm-project/vllm/pull/3287
[Bugfix][Frontend] Raise exception when file-like chat template fails to be opened by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/4292
[Kernel] FP8 support for MoE kernel / Mixtral by @pcmoritz in https://github.com/vllm-project/vllm/pull/4244
[Bugfix] fixed fp8 conflict with aqlm by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/4307
[Core][Distributed] use cpu/gloo to initialize pynccl by @youkaichao in https://github.com/vllm-project/vllm/pull/4248
[CI][Build] change pynvml to nvidia-ml-py by @youkaichao in https://github.com/vllm-project/vllm/pull/4302
[Misc] Reduce supported Punica dtypes by @WoosukKwon in https://github.com/vllm-project/vllm/pull/4304

New Contributors

@mawong-amd made their first contribution in https://github.com/vllm-project/vllm/pull/3662
@Qubitium made their first contribution in https://github.com/vllm-project/vllm/pull/3689
@bigPYJ1151 made their first contribution in https://github.com/vllm-project/vllm/pull/3634
@A-Mahla made their first contribution in https://github.com/vllm-project/vllm/pull/3788
@AdrianAbeyta made their first contribution in https://github.com/vllm-project/vllm/pull/3290
@mgerstgrasser made their first contribution in https://github.com/vllm-project/vllm/pull/3749
@CatherineSue made their first contribution in https://github.com/vllm-project/vllm/pull/3836
@saurabhdash2512 made their first contribution in https://github.com/vllm-project/vllm/pull/3829
@SeanGallen made their first contribution in https://github.com/vllm-project/vllm/pull/3810
@SUDA-HLT-ywfang made their first contribution in https://github.com/vllm-project/vllm/pull/3893
@egortolmachev made their first contribution in https://github.com/vllm-project/vllm/pull/3849
@Ki6an made their first contribution in https://github.com/vllm-project/vllm/pull/3767
@jsato8094 made their first contribution in https://github.com/vllm-project/vllm/pull/3925
@jpvillam-amd made their first contribution in https://github.com/vllm-project/vllm/pull/3643
@PZD-CHINA made their first contribution in https://github.com/vllm-project/vllm/pull/3915
@zhaotyer made their first contribution in https://github.com/vllm-project/vllm/pull/3955
@huyiwen made their first contribution in https://github.com/vllm-project/vllm/pull/3899
@dmarasco made their first contribution in https://github.com/vllm-project/vllm/pull/3945
@fpaupier made their first contribution in https://github.com/vllm-project/vllm/pull/3963
@kingljl made their first contribution in https://github.com/vllm-project/vllm/pull/4007
@DarkLight1337 made their first contribution in https://github.com/vllm-project/vllm/pull/4026
@Bellk17 made their first contribution in https://github.com/vllm-project/vllm/pull/3984
@sangstar made their first contribution in https://github.com/vllm-project/vllm/pull/3476
@rickyyx made their first contribution in https://github.com/vllm-project/vllm/pull/4095
@elinx made their first contribution in https://github.com/vllm-project/vllm/pull/4137
@ucciicci made their first contribution in https://github.com/vllm-project/vllm/pull/4134
@mmoskal made their first contribution in https://github.com/vllm-project/vllm/pull/4128
@agt made their first contribution in https://github.com/vllm-project/vllm/pull/4159
@ayusher made their first contribution in https://github.com/vllm-project/vllm/pull/4189
@nunjunj made their first contribution in https://github.com/vllm-project/vllm/pull/3570
@YeFD made their first contribution in https://github.com/vllm-project/vllm/pull/4236
@GeauxEric made their first contribution in https://github.com/vllm-project/vllm/pull/3748
@alexm-nm made their first contribution in https://github.com/vllm-project/vllm/pull/4217
@jgordley made their first contribution in https://github.com/vllm-project/vllm/pull/4016
@DefTruth made their first contribution in https://github.com/vllm-project/vllm/pull/4286
@jaemzfleming made their first contribution in https://github.com/vllm-project/vllm/pull/3287

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.4.0...v0.4.1

相关地址：原始地址下载(tar) 下载(zip)

1、 vllm-0.4.1+cu118-cp310-cp310-manylinux1_x86_64.whl 80.12MB

2、 vllm-0.4.1+cu118-cp311-cp311-manylinux1_x86_64.whl 80.16MB

3、 vllm-0.4.1+cu118-cp38-cp38-manylinux1_x86_64.whl 80.12MB

4、 vllm-0.4.1+cu118-cp39-cp39-manylinux1_x86_64.whl 80.12MB

5、 vllm-0.4.1-cp310-cp310-manylinux1_x86_64.whl 80.08MB

6、 vllm-0.4.1-cp311-cp311-manylinux1_x86_64.whl 80.13MB

7、 vllm-0.4.1-cp38-cp38-manylinux1_x86_64.whl 80.09MB

8、 vllm-0.4.1-cp39-cp39-manylinux1_x86_64.whl 80.08MB

查看：2024-04-24发行的版本