v0.4.1
版本发布时间: 2024-04-24 10:28:08
vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)
Highlights
Features
- Support and enhance CommandR+ (#3829), minicpm (#3893), Meta Llama 3 (#4175, #4182), Mixtral 8x22b (#4073, #4002)
- Support private model registration, and updating our support policy (#3871, 3948)
- Support PyTorch 2.2.1 and Triton 2.2.0 (#4061, #4079, #3805, #3904, #4271)
- Add option for using LM Format Enforcer for guided decoding (#3868)
- Add option for optionally initialize tokenizer and detokenizer (#3748)
- Add option for load model using
tensorizer
(#3476)
Enhancements
- vLLM is now mostly type checked by
mypy
(#3816, #4006, #4161, #4043) - Progress towards chunked prefill scheduler (#3550, #3853, #4280, #3884)
- Progress towards speculative decoding (#3250, #3706, #3894)
- Initial support with dynamic per-tensor scaling via FP8 (#4118)
Hardwares
- Intel CPU inference backend is added (#3993, #3634)
- AMD backend is enhanced with Triton kernel and e4m3fn KV cache (#3643, #3290)
What's Changed
- [Kernel] Layernorm performance optimization by @mawong-amd in https://github.com/vllm-project/vllm/pull/3662
- [Doc] Update installation doc for build from source and explain the dependency on torch/cuda version by @youkaichao in https://github.com/vllm-project/vllm/pull/3746
- [CI/Build] Make Marlin Tests Green by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3753
- [Misc] Minor fixes in requirements.txt by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3769
- [Misc] Some minor simplifications to detokenization logic by @njhill in https://github.com/vllm-project/vllm/pull/3670
- [Misc] Fix Benchmark TTFT Calculation for Chat Completions by @ywang96 in https://github.com/vllm-project/vllm/pull/3768
- [Speculative decoding 4/9] Lookahead scheduling for speculative decoding by @cadedaniel in https://github.com/vllm-project/vllm/pull/3250
- [Misc] Add support for new autogptq checkpoint_format by @Qubitium in https://github.com/vllm-project/vllm/pull/3689
- [Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup by @cadedaniel in https://github.com/vllm-project/vllm/pull/3783
- [Hardware][Intel] Add CPU inference backend by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3634
- [HotFix] [CI/Build] Minor fix for CPU backend CI by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3787
- [Frontend][Bugfix] allow using the default middleware with a root path by @A-Mahla in https://github.com/vllm-project/vllm/pull/3788
- [Doc] Fix vLLMEngine Doc Page by @ywang96 in https://github.com/vllm-project/vllm/pull/3791
- [CI/Build] fix TORCH_CUDA_ARCH_LIST in wheel build by @youkaichao in https://github.com/vllm-project/vllm/pull/3801
- Fix crash when try torch.cuda.set_device in worker by @leiwen83 in https://github.com/vllm-project/vllm/pull/3770
- [Bugfix] Add
__init__.py
files forvllm/core/block/
andvllm/spec_decode/
by @mgoin in https://github.com/vllm-project/vllm/pull/3798 - [CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary by @youkaichao in https://github.com/vllm-project/vllm/pull/3803
- [Speculative decoding] Adding configuration object for speculative decoding by @cadedaniel in https://github.com/vllm-project/vllm/pull/3706
- [BugFix] Use different mechanism to get vllm version in
is_cpu()
by @njhill in https://github.com/vllm-project/vllm/pull/3804 - [Doc] Update README.md by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3806
- [Doc] Update contribution guidelines for better onboarding by @michaelfeil in https://github.com/vllm-project/vllm/pull/3819
- [3/N] Refactor scheduler for chunked prefill scheduling by @rkooo567 in https://github.com/vllm-project/vllm/pull/3550
- Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) by @AdrianAbeyta in https://github.com/vllm-project/vllm/pull/3290
- [Misc] Publish 3rd meetup slides by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3835
- Fixes the argument for local_tokenizer_group by @sighingnow in https://github.com/vllm-project/vllm/pull/3754
- [Core] Enable hf_transfer by default if available by @michaelfeil in https://github.com/vllm-project/vllm/pull/3817
- [Bugfix] Add kv_scale input parameter to CPU backend by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3840
- [Core] [Frontend] Make detokenization optional by @mgerstgrasser in https://github.com/vllm-project/vllm/pull/3749
- [Bugfix] Fix args in benchmark_serving by @CatherineSue in https://github.com/vllm-project/vllm/pull/3836
- [Benchmark] Refactor sample_requests in benchmark_throughput by @gty111 in https://github.com/vllm-project/vllm/pull/3613
- [Core] manage nccl via a pypi package & upgrade to pt 2.2.1 by @youkaichao in https://github.com/vllm-project/vllm/pull/3805
- [Hardware][CPU] Update cpu torch to match default of 2.2.1 by @mgoin in https://github.com/vllm-project/vllm/pull/3854
- [Model] Cohere CommandR+ by @saurabhdash2512 in https://github.com/vllm-project/vllm/pull/3829
- [Core] improve robustness of pynccl by @youkaichao in https://github.com/vllm-project/vllm/pull/3860
- [Doc]Add asynchronous engine arguments to documentation. by @SeanGallen in https://github.com/vllm-project/vllm/pull/3810
- [CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels by @youkaichao in https://github.com/vllm-project/vllm/pull/3859
- [Misc] Add pytest marker to opt-out of global test cleanup by @cadedaniel in https://github.com/vllm-project/vllm/pull/3863
- [Misc] Fix linter issues in examples/fp8/quantizer/quantize.py by @cadedaniel in https://github.com/vllm-project/vllm/pull/3864
- [Bugfix] Fixing requirements.txt by @noamgat in https://github.com/vllm-project/vllm/pull/3865
- [Misc] Define common requirements by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3841
- Add option to completion API to truncate prompt tokens by @tdoublep in https://github.com/vllm-project/vllm/pull/3144
- [Chunked Prefill][4/n] Chunked prefill scheduler. by @rkooo567 in https://github.com/vllm-project/vllm/pull/3853
- [Bugfix] Fix incorrect output on OLMo models in Tensor Parallelism by @Isotr0py in https://github.com/vllm-project/vllm/pull/3869
- [CI/Benchmark] add more iteration and use multiple percentiles for robust latency benchmark by @youkaichao in https://github.com/vllm-project/vllm/pull/3889
- [Core] enable out-of-tree model register by @youkaichao in https://github.com/vllm-project/vllm/pull/3871
- [WIP][Core] latency optimization by @youkaichao in https://github.com/vllm-project/vllm/pull/3890
- [Bugfix] Fix Llava inference with Tensor Parallelism. by @Isotr0py in https://github.com/vllm-project/vllm/pull/3883
- [Model] add minicpm by @SUDA-HLT-ywfang in https://github.com/vllm-project/vllm/pull/3893
- [Bugfix] Added Command-R GPTQ support by @egortolmachev in https://github.com/vllm-project/vllm/pull/3849
- [Bugfix] Enable Proper
attention_bias
Usage in Llama Model Configuration by @Ki6an in https://github.com/vllm-project/vllm/pull/3767 - [Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations by @mawong-amd in https://github.com/vllm-project/vllm/pull/3782
- [BugFix][Model] Fix commandr RoPE max_position_embeddings by @esmeetu in https://github.com/vllm-project/vllm/pull/3919
- [Core] separate distributed_init from worker by @youkaichao in https://github.com/vllm-project/vllm/pull/3904
- [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" by @cadedaniel in https://github.com/vllm-project/vllm/pull/3837
- [Bugfix] Fix KeyError on loading GPT-NeoX by @jsato8094 in https://github.com/vllm-project/vllm/pull/3925
- [ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm by @jpvillam-amd in https://github.com/vllm-project/vllm/pull/3643
- [Misc] Avoid loading incorrect LoRA config by @jeejeelee in https://github.com/vllm-project/vllm/pull/3777
- [Benchmark] Add cpu options to bench scripts by @PZD-CHINA in https://github.com/vllm-project/vllm/pull/3915
- [Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable by @zhaotyer in https://github.com/vllm-project/vllm/pull/3955
- [Bugfix] Fix logits processor when prompt_logprobs is not None by @huyiwen in https://github.com/vllm-project/vllm/pull/3899
- [Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3876
- [Bugfix][ROCm] Add numba to Dockerfile.rocm by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3962
- [Model][AMD] ROCm support for 256 head dims for Gemma by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/3972
- [Doc] Add doc to state our model support policy by @youkaichao in https://github.com/vllm-project/vllm/pull/3948
- [Bugfix] Remove key sorting for
guided_json
parameter in OpenAi compatible Server by @dmarasco in https://github.com/vllm-project/vllm/pull/3945 - [Doc] Fix getting stared to use publicly available model by @fpaupier in https://github.com/vllm-project/vllm/pull/3963
- [Bugfix] handle hf_config with architectures == None by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3982
- [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators by @youkaichao in https://github.com/vllm-project/vllm/pull/3950
- [Core][5/N] Fully working chunked prefill e2e by @rkooo567 in https://github.com/vllm-project/vllm/pull/3884
- [Core][Model] Use torch.compile to accelerate layernorm in commandr by @youkaichao in https://github.com/vllm-project/vllm/pull/3985
- [Test] Add xformer and flash attn tests by @rkooo567 in https://github.com/vllm-project/vllm/pull/3961
- [Misc] refactor ops and cache_ops layer by @jikunshang in https://github.com/vllm-project/vllm/pull/3913
- [Doc][Installation] delete python setup.py develop by @youkaichao in https://github.com/vllm-project/vllm/pull/3989
- [Kernel] Fused MoE Config for Mixtral 8x22 by @ywang96 in https://github.com/vllm-project/vllm/pull/4002
- fix-bgmv-kernel-640 by @kingljl in https://github.com/vllm-project/vllm/pull/4007
- [Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/3824
- [Core] Set
linear_weights
directly on the layer by @Yard1 in https://github.com/vllm-project/vllm/pull/3977 - [Core][Distributed] make init_distributed_environment compatible with init_process_group by @youkaichao in https://github.com/vllm-project/vllm/pull/4014
- Fix echo/logprob OpenAI completion bug by @dylanwhawk in https://github.com/vllm-project/vllm/pull/3441
- [Kernel] Add extra punica sizes to support bigger vocabs by @Yard1 in https://github.com/vllm-project/vllm/pull/4015
- [BugFix] Fix handling of stop strings and stop token ids by @njhill in https://github.com/vllm-project/vllm/pull/3672
- [Doc] Add typing hints / mypy types cleanup by @michaelfeil in https://github.com/vllm-project/vllm/pull/3816
- [Core] Support LoRA on quantized models by @jeejeelee in https://github.com/vllm-project/vllm/pull/4012
- [Frontend][Core] Move
merge_async_iterators
to utils by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/4026 - [Test] Test multiple attn backend for chunked prefill. by @rkooo567 in https://github.com/vllm-project/vllm/pull/4023
- [Bugfix] fix type hint for py 3.8 by @youkaichao in https://github.com/vllm-project/vllm/pull/4036
- [Misc] Fix typo in scheduler.py by @zhuohan123 in https://github.com/vllm-project/vllm/pull/4022
- [mypy] Add mypy type annotation part 1 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4006
- [Core] fix custom allreduce default value by @youkaichao in https://github.com/vllm-project/vllm/pull/4040
- Fix triton compilation issue by @Bellk17 in https://github.com/vllm-project/vllm/pull/3984
- [Bugfix] Fix LoRA bug by @jeejeelee in https://github.com/vllm-project/vllm/pull/4032
- [CI/Test] expand ruff and yapf for all supported python version by @youkaichao in https://github.com/vllm-project/vllm/pull/4037
- [Bugfix] More type hint fixes for py 3.8 by @dylanwhawk in https://github.com/vllm-project/vllm/pull/4039
- [Core][Distributed] improve logging for init dist by @youkaichao in https://github.com/vllm-project/vllm/pull/4042
- [Bugfix] fix_log_time_in_metrics by @zspo in https://github.com/vllm-project/vllm/pull/4050
- [Bugfix] fix_small_bug_in_neuron_executor by @zspo in https://github.com/vllm-project/vllm/pull/4051
- [Kernel] Add punica dimension for Baichuan-13B by @jeejeelee in https://github.com/vllm-project/vllm/pull/4053
- [Frontend] [Core] feat: Add model loading using
tensorizer
by @sangstar in https://github.com/vllm-project/vllm/pull/3476 - [Core] avoid too many cuda context by caching p2p test by @youkaichao in https://github.com/vllm-project/vllm/pull/4021
- [BugFix] Fix tensorizer extra in setup.py by @njhill in https://github.com/vllm-project/vllm/pull/4072
- [Docs] document that mixtral 8x22b is supported by @simon-mo in https://github.com/vllm-project/vllm/pull/4073
- [Misc] Upgrade triton to 2.2.0 by @esmeetu in https://github.com/vllm-project/vllm/pull/4061
- [Bugfix] Fix filelock version requirement by @zhuohan123 in https://github.com/vllm-project/vllm/pull/4075
- [Misc][Minor] Fix CPU block num log in CPUExecutor. by @bigPYJ1151 in https://github.com/vllm-project/vllm/pull/4088
- [Core] Simplifications to executor classes by @njhill in https://github.com/vllm-project/vllm/pull/4071
- [Doc] Add better clarity for tensorizer usage by @sangstar in https://github.com/vllm-project/vllm/pull/4090
- [Bugfix] Fix ray workers profiling with nsight by @rickyyx in https://github.com/vllm-project/vllm/pull/4095
- [Typing] Fix Sequence type GenericAlias only available after Python 3.9. by @rkooo567 in https://github.com/vllm-project/vllm/pull/4092
- [Core] Fix engine-use-ray broken by @rkooo567 in https://github.com/vllm-project/vllm/pull/4105
- LM Format Enforcer Guided Decoding Support by @noamgat in https://github.com/vllm-project/vllm/pull/3868
- [Core] Refactor model loading code by @Yard1 in https://github.com/vllm-project/vllm/pull/4097
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine by @cadedaniel in https://github.com/vllm-project/vllm/pull/3894
- [Misc] [CI] Fix CI failure caught after merge by @cadedaniel in https://github.com/vllm-project/vllm/pull/4126
- [CI] Move CPU/AMD tests to after wait by @cadedaniel in https://github.com/vllm-project/vllm/pull/4123
- [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication by @youkaichao in https://github.com/vllm-project/vllm/pull/4024
- [Bugfix] fix output parsing error for trtllm backend by @elinx in https://github.com/vllm-project/vllm/pull/4137
- [Kernel] Add punica dimension for Swallow-MS-7B LoRA by @ucciicci in https://github.com/vllm-project/vllm/pull/4134
- [Typing] Mypy typing part 2 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4043
- [Core] Add integrity check during initialization; add test for it by @youkaichao in https://github.com/vllm-project/vllm/pull/4155
- Allow model to be served under multiple names by @hmellor in https://github.com/vllm-project/vllm/pull/2894
- [Bugfix] Get available quantization methods from quantization registry by @mgoin in https://github.com/vllm-project/vllm/pull/4098
- [Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill by @mmoskal in https://github.com/vllm-project/vllm/pull/4128
- [Docs] document that Meta Llama 3 is supported by @simon-mo in https://github.com/vllm-project/vllm/pull/4175
- [Bugfix] Support logprobs when using guided_json and other constrained decoding fields by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/4149
- [Misc] Bump transformers to latest version by @njhill in https://github.com/vllm-project/vllm/pull/4176
- [CI/CD] add neuron docker and ci test scripts by @liangfu in https://github.com/vllm-project/vllm/pull/3571
- [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) by @agt in https://github.com/vllm-project/vllm/pull/4159
- [Core] add an option to log every function call to for debugging hang/crash in distributed inference by @youkaichao in https://github.com/vllm-project/vllm/pull/4079
- Support eos_token_id from generation_config.json by @simon-mo in https://github.com/vllm-project/vllm/pull/4182
- [Bugfix] Fix LoRA loading check by @jeejeelee in https://github.com/vllm-project/vllm/pull/4138
- Bump version of 0.4.1 by @simon-mo in https://github.com/vllm-project/vllm/pull/4177
- [Misc] fix docstrings by @UranusSeven in https://github.com/vllm-project/vllm/pull/4191
- [Bugfix][Core] Restore logging of stats in the async engine by @ronensc in https://github.com/vllm-project/vllm/pull/4150
- [Misc] add nccl in collect env by @youkaichao in https://github.com/vllm-project/vllm/pull/4211
- Pass
tokenizer_revision
when getting tokenizer in openai serving by @chiragjn in https://github.com/vllm-project/vllm/pull/4214 - [Bugfix] Add fix for JSON whitespace by @ayusher in https://github.com/vllm-project/vllm/pull/4189
- Fix missing docs and out of sync
EngineArgs
by @hmellor in https://github.com/vllm-project/vllm/pull/4219 - [Kernel][FP8] Initial support with dynamic per-tensor scaling by @comaniac in https://github.com/vllm-project/vllm/pull/4118
- [Frontend] multiple sampling params support by @nunjunj in https://github.com/vllm-project/vllm/pull/3570
- Updating lm-format-enforcer version and adding links to decoding libraries in docs by @noamgat in https://github.com/vllm-project/vllm/pull/4222
- Don't show default value for flags in
EngineArgs
by @hmellor in https://github.com/vllm-project/vllm/pull/4223 - [Doc]: Update the page of adding new models by @YeFD in https://github.com/vllm-project/vllm/pull/4236
- Make initialization of tokenizer and detokenizer optional by @GeauxEric in https://github.com/vllm-project/vllm/pull/3748
- [AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring by @hongxiayang in https://github.com/vllm-project/vllm/pull/4129
- [Core][Distributed] fix _is_full_nvlink detection by @youkaichao in https://github.com/vllm-project/vllm/pull/4233
- [Misc] Add vision language model support to CPU backend by @Isotr0py in https://github.com/vllm-project/vllm/pull/3968
- [Bugfix] Fix type annotations in CPU model runner by @WoosukKwon in https://github.com/vllm-project/vllm/pull/4256
- [Frontend] Enable support for CPU backend in AsyncLLMEngine. by @sighingnow in https://github.com/vllm-project/vllm/pull/3993
- [Bugfix] Ensure download_weights_from_hf(..) inside loader is using the revision parameter by @alexm-nm in https://github.com/vllm-project/vllm/pull/4217
- Add example scripts to documentation by @hmellor in https://github.com/vllm-project/vllm/pull/4225
- [Core] Scheduler perf fix by @rkooo567 in https://github.com/vllm-project/vllm/pull/4270
- [Doc] Update the SkyPilot doc with serving and Llama-3 by @Michaelvll in https://github.com/vllm-project/vllm/pull/4276
- [Core][Distributed] use absolute path for library file by @youkaichao in https://github.com/vllm-project/vllm/pull/4271
- Fix
autodoc
directives by @hmellor in https://github.com/vllm-project/vllm/pull/4272 - [Mypy] Part 3 fix typing for nested directories for most of directory by @rkooo567 in https://github.com/vllm-project/vllm/pull/4161
- [Core] Some simplification of WorkerWrapper changes by @njhill in https://github.com/vllm-project/vllm/pull/4183
- [Core] Scheduling optimization 2 by @rkooo567 in https://github.com/vllm-project/vllm/pull/4280
- [Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. by @cadedaniel in https://github.com/vllm-project/vllm/pull/3951
- [Bugfix] Fixing max token error message for openai compatible server by @jgordley in https://github.com/vllm-project/vllm/pull/4016
- [Bugfix] Add init_cached_hf_modules to RayWorkerWrapper by @DefTruth in https://github.com/vllm-project/vllm/pull/4286
- [Core][Logging] Add last frame information for better debugging by @youkaichao in https://github.com/vllm-project/vllm/pull/4278
- [CI] Add ccache for wheel builds job by @simon-mo in https://github.com/vllm-project/vllm/pull/4281
- AQLM CUDA support by @jaemzfleming in https://github.com/vllm-project/vllm/pull/3287
- [Bugfix][Frontend] Raise exception when file-like chat template fails to be opened by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/4292
- [Kernel] FP8 support for MoE kernel / Mixtral by @pcmoritz in https://github.com/vllm-project/vllm/pull/4244
- [Bugfix] fixed fp8 conflict with aqlm by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/4307
- [Core][Distributed] use cpu/gloo to initialize pynccl by @youkaichao in https://github.com/vllm-project/vllm/pull/4248
- [CI][Build] change pynvml to nvidia-ml-py by @youkaichao in https://github.com/vllm-project/vllm/pull/4302
- [Misc] Reduce supported Punica dtypes by @WoosukKwon in https://github.com/vllm-project/vllm/pull/4304
New Contributors
- @mawong-amd made their first contribution in https://github.com/vllm-project/vllm/pull/3662
- @Qubitium made their first contribution in https://github.com/vllm-project/vllm/pull/3689
- @bigPYJ1151 made their first contribution in https://github.com/vllm-project/vllm/pull/3634
- @A-Mahla made their first contribution in https://github.com/vllm-project/vllm/pull/3788
- @AdrianAbeyta made their first contribution in https://github.com/vllm-project/vllm/pull/3290
- @mgerstgrasser made their first contribution in https://github.com/vllm-project/vllm/pull/3749
- @CatherineSue made their first contribution in https://github.com/vllm-project/vllm/pull/3836
- @saurabhdash2512 made their first contribution in https://github.com/vllm-project/vllm/pull/3829
- @SeanGallen made their first contribution in https://github.com/vllm-project/vllm/pull/3810
- @SUDA-HLT-ywfang made their first contribution in https://github.com/vllm-project/vllm/pull/3893
- @egortolmachev made their first contribution in https://github.com/vllm-project/vllm/pull/3849
- @Ki6an made their first contribution in https://github.com/vllm-project/vllm/pull/3767
- @jsato8094 made their first contribution in https://github.com/vllm-project/vllm/pull/3925
- @jpvillam-amd made their first contribution in https://github.com/vllm-project/vllm/pull/3643
- @PZD-CHINA made their first contribution in https://github.com/vllm-project/vllm/pull/3915
- @zhaotyer made their first contribution in https://github.com/vllm-project/vllm/pull/3955
- @huyiwen made their first contribution in https://github.com/vllm-project/vllm/pull/3899
- @dmarasco made their first contribution in https://github.com/vllm-project/vllm/pull/3945
- @fpaupier made their first contribution in https://github.com/vllm-project/vllm/pull/3963
- @kingljl made their first contribution in https://github.com/vllm-project/vllm/pull/4007
- @DarkLight1337 made their first contribution in https://github.com/vllm-project/vllm/pull/4026
- @Bellk17 made their first contribution in https://github.com/vllm-project/vllm/pull/3984
- @sangstar made their first contribution in https://github.com/vllm-project/vllm/pull/3476
- @rickyyx made their first contribution in https://github.com/vllm-project/vllm/pull/4095
- @elinx made their first contribution in https://github.com/vllm-project/vllm/pull/4137
- @ucciicci made their first contribution in https://github.com/vllm-project/vllm/pull/4134
- @mmoskal made their first contribution in https://github.com/vllm-project/vllm/pull/4128
- @agt made their first contribution in https://github.com/vllm-project/vllm/pull/4159
- @ayusher made their first contribution in https://github.com/vllm-project/vllm/pull/4189
- @nunjunj made their first contribution in https://github.com/vllm-project/vllm/pull/3570
- @YeFD made their first contribution in https://github.com/vllm-project/vllm/pull/4236
- @GeauxEric made their first contribution in https://github.com/vllm-project/vllm/pull/3748
- @alexm-nm made their first contribution in https://github.com/vllm-project/vllm/pull/4217
- @jgordley made their first contribution in https://github.com/vllm-project/vllm/pull/4016
- @DefTruth made their first contribution in https://github.com/vllm-project/vllm/pull/4286
- @jaemzfleming made their first contribution in https://github.com/vllm-project/vllm/pull/3287
Full Changelog: https://github.com/vllm-project/vllm/compare/v0.4.0...v0.4.1
1、 vllm-0.4.1+cu118-cp310-cp310-manylinux1_x86_64.whl 80.12MB
2、 vllm-0.4.1+cu118-cp311-cp311-manylinux1_x86_64.whl 80.16MB
3、 vllm-0.4.1+cu118-cp38-cp38-manylinux1_x86_64.whl 80.12MB
4、 vllm-0.4.1+cu118-cp39-cp39-manylinux1_x86_64.whl 80.12MB
5、 vllm-0.4.1-cp310-cp310-manylinux1_x86_64.whl 80.08MB
6、 vllm-0.4.1-cp311-cp311-manylinux1_x86_64.whl 80.13MB
7、 vllm-0.4.1-cp38-cp38-manylinux1_x86_64.whl 80.09MB
8、 vllm-0.4.1-cp39-cp39-manylinux1_x86_64.whl 80.08MB