v0.4.0

vllm-project/vllm

版本发布时间: 2024-03-30 09:54:27

vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)

Major changes

Models

New models: Command+R(#3433), Qwen2 MoE(#3346), DBRX(#3660), XVerse (#3610), Jais (#3183).
New vision language model: LLaVA (#3042)

Production features

Automatic prefix caching (#2762, #3703) supporting long system prompt to be automatically cached across requests. Use the flag --enable-prefix-caching to turn it on.
Support json_object in OpenAI server for arbitrary JSON, --use-delay flag to improve time to first token across many requests, and min_tokens to EOS suppression.
Progress in chunked prefill scheduler (#3236, #3538), and speculative decoding (#3103).
Custom all reduce kernel has been re-enabled after more robustness fixes.
Replaced cupy dependency due to its bugs.

Hardware

Improved Neuron support for AWS Inferentia.
CMake based build system for extensibility.

Ecosystem

Extensive serving benchmark refactoring (#3277)
Usage statistics collection (#2852)

What's Changed

allow user chose log level by --log-level instead of fixed 'info'. by @AllenDou in https://github.com/vllm-project/vllm/pull/3109
Reorder kv dtype check to avoid nvcc not found error on AMD platform by @cloudhan in https://github.com/vllm-project/vllm/pull/3104
Add Automatic Prefix Caching by @SageMoore in https://github.com/vllm-project/vllm/pull/2762
Add vLLM version info to logs and openai API server by @jasonacox in https://github.com/vllm-project/vllm/pull/3161
[FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3158
Make it easy to profile workers with nsight by @pcmoritz in https://github.com/vllm-project/vllm/pull/3162
[DOC] add setup document to support neuron backend by @liangfu in https://github.com/vllm-project/vllm/pull/2777
[Minor Fix] Remove unused code in benchmark_prefix_caching.py by @gty111 in https://github.com/vllm-project/vllm/pull/3171
Add document for vllm paged attention kernel. by @pian13131 in https://github.com/vllm-project/vllm/pull/2978
enable --gpu-memory-utilization in benchmark_throughput.py by @AllenDou in https://github.com/vllm-project/vllm/pull/3175
[Minor fix] The domain dns.google may cause a socket.gaierror exception by @ttbachyinsda in https://github.com/vllm-project/vllm/pull/3176
Push logprob generation to LLMEngine by @Yard1 in https://github.com/vllm-project/vllm/pull/3065
Add health check, make async Engine more robust by @Yard1 in https://github.com/vllm-project/vllm/pull/3015
Fix the openai benchmarking requests to work with latest OpenAI apis by @wangchen615 in https://github.com/vllm-project/vllm/pull/2992
[ROCm] enable cupy in order to enable cudagraph mode for AMD GPUs by @hongxiayang in https://github.com/vllm-project/vllm/pull/3123
Store eos_token_id in Sequence for easy access by @njhill in https://github.com/vllm-project/vllm/pull/3166
[Fix] Avoid pickling entire LLMEngine for Ray workers by @njhill in https://github.com/vllm-project/vllm/pull/3207
[Tests] Add block manager and scheduler tests by @rkooo567 in https://github.com/vllm-project/vllm/pull/3108
[Testing] Fix core tests by @cadedaniel in https://github.com/vllm-project/vllm/pull/3224
A simple addition of dynamic_ncols=True by @chujiezheng in https://github.com/vllm-project/vllm/pull/3242
Add GPTQ support for Gemma by @TechxGenus in https://github.com/vllm-project/vllm/pull/3200
Update requirements-dev.txt to include package for benchmarking scripts. by @wangchen615 in https://github.com/vllm-project/vllm/pull/3181
Separate attention backends by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3005
Measure model memory usage by @mgoin in https://github.com/vllm-project/vllm/pull/3120
Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) by @jacobthebanana in https://github.com/vllm-project/vllm/pull/3263
Fix auto prefix bug by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3239
Connect engine healthcheck to openai server by @njhill in https://github.com/vllm-project/vllm/pull/3260
Feature add lora support for Qwen2 by @whyiug in https://github.com/vllm-project/vllm/pull/3177
[Minor Fix] Fix comments in benchmark_serving by @gty111 in https://github.com/vllm-project/vllm/pull/3252
[Docs] Fix Unmocked Imports by @ywang96 in https://github.com/vllm-project/vllm/pull/3275
[FIX] Make flash_attn optional by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3269
Move model filelocks from /tmp/ to ~/.cache/vllm/locks/ dir by @mgoin in https://github.com/vllm-project/vllm/pull/3241
[FIX] Fix prefix test error on main by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3286
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling by @cadedaniel in https://github.com/vllm-project/vllm/pull/3103
Enhance lora tests with more layer and rank variations by @tterrysun in https://github.com/vllm-project/vllm/pull/3243
[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA by @dllehr-amd in https://github.com/vllm-project/vllm/pull/3262
[BugFix] Fix get tokenizer when using ray by @esmeetu in https://github.com/vllm-project/vllm/pull/3301
[Fix] Fix best_of behavior when n=1 by @njhill in https://github.com/vllm-project/vllm/pull/3298
Re-enable the 80 char line width limit by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3305
[docs] Add LoRA support information for models by @pcmoritz in https://github.com/vllm-project/vllm/pull/3299
Add distributed model executor abstraction by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3191
[ROCm] Fix warp and lane calculation in blockReduceSum by @kliuae in https://github.com/vllm-project/vllm/pull/3321
Support Mistral Model Inference with transformers-neuronx by @DAIZHENWEI in https://github.com/vllm-project/vllm/pull/3153
docs: Add BentoML deployment doc by @Sherlock113 in https://github.com/vllm-project/vllm/pull/3336
Fixes #1556 double free by @br3no in https://github.com/vllm-project/vllm/pull/3347
Add kernel for GeGLU with approximate GELU by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3337
[Fix] fix quantization arg when using marlin by @DreamTeamWangbowen in https://github.com/vllm-project/vllm/pull/3319
add hf_transfer to requirements.txt by @RonanKMcGovern in https://github.com/vllm-project/vllm/pull/3031
fix bias in if, ambiguous by @hliuca in https://github.com/vllm-project/vllm/pull/3259
[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build by @chenxu2048 in https://github.com/vllm-project/vllm/pull/3256
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. by @orsharir in https://github.com/vllm-project/vllm/pull/3350
Add batched RoPE kernel by @tterrysun in https://github.com/vllm-project/vllm/pull/3095
Fix lint by @Yard1 in https://github.com/vllm-project/vllm/pull/3388
[FIX] Simpler fix for async engine running on ray by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3371
[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion by @simon-mo in https://github.com/vllm-project/vllm/pull/3383
allow user to chose which vllm's merics to display in grafana by @AllenDou in https://github.com/vllm-project/vllm/pull/3393
[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 by @youkaichao in https://github.com/vllm-project/vllm/pull/3389
Install flash_attn in Docker image by @tdoublep in https://github.com/vllm-project/vllm/pull/3396
Add args for mTLS support by @declark1 in https://github.com/vllm-project/vllm/pull/3410
[issue templates] add some issue templates by @youkaichao in https://github.com/vllm-project/vllm/pull/3412
Fix assertion failure in Qwen 1.5 with prefix caching enabled by @chenxu2048 in https://github.com/vllm-project/vllm/pull/3373
fix marlin config repr by @qeternity in https://github.com/vllm-project/vllm/pull/3414
Feature: dynamic shared mem moe_align_block_size_kernel by @akhoroshev in https://github.com/vllm-project/vllm/pull/3376
[Misc] add HOST_IP env var by @youkaichao in https://github.com/vllm-project/vllm/pull/3419
Add chat templates for Falcon by @Dinghow in https://github.com/vllm-project/vllm/pull/3420
Add chat templates for ChatGLM by @Dinghow in https://github.com/vllm-project/vllm/pull/3418
Fix dist.broadcast stall without group argument by @GindaChen in https://github.com/vllm-project/vllm/pull/3408
Fix tie_word_embeddings for Qwen2. by @fyabc in https://github.com/vllm-project/vllm/pull/3344
[Fix] Add args for mTLS support by @declark1 in https://github.com/vllm-project/vllm/pull/3430
Fixes the misuse/mixuse of time.time()/time.monotonic() by @sighingnow in https://github.com/vllm-project/vllm/pull/3220
[Misc] add error message in non linux platform by @youkaichao in https://github.com/vllm-project/vllm/pull/3438
Fix issue templates by @hmellor in https://github.com/vllm-project/vllm/pull/3436
fix document error for value and v_vec illustration by @laneeeee in https://github.com/vllm-project/vllm/pull/3421
Asynchronous tokenization by @Yard1 in https://github.com/vllm-project/vllm/pull/2879
Removed Extraneous Print Message From OAI Server by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3440
[Misc] PR templates by @youkaichao in https://github.com/vllm-project/vllm/pull/3413
Fixes the incorrect argument in the prefix-prefill test cases by @sighingnow in https://github.com/vllm-project/vllm/pull/3246
Replace lstrip() with removeprefix() to fix Ruff linter warning by @ronensc in https://github.com/vllm-project/vllm/pull/2958
Fix Baichuan chat template by @Dinghow in https://github.com/vllm-project/vllm/pull/3340
[Misc] fix line length for entire codebase by @simon-mo in https://github.com/vllm-project/vllm/pull/3444
Support arbitrary json_object in OpenAI and Context Free Grammar by @simon-mo in https://github.com/vllm-project/vllm/pull/3211
Fix setup.py neuron-ls issue by @simon-mo in https://github.com/vllm-project/vllm/pull/2671
[Misc] Define from_dict and to_dict in InputMetadata by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3452
[CI] Shard tests for LoRA and Kernels to speed up by @simon-mo in https://github.com/vllm-project/vllm/pull/3445
[Bugfix] Make moe_align_block_size AMD-compatible by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3470
CI: Add ROCm Docker Build by @simon-mo in https://github.com/vllm-project/vllm/pull/2886
[Testing] Add test_config.py to CI by @cadedaniel in https://github.com/vllm-project/vllm/pull/3437
[CI/Build] Fix Bad Import In Test by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3473
[Misc] Fix PR Template by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3478
Cmake based build system by @bnellnm in https://github.com/vllm-project/vllm/pull/2830
[Core] Zero-copy asdict for InputMetadata by @Yard1 in https://github.com/vllm-project/vllm/pull/3475
[Misc] Update README for the Third vLLM Meetup by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3479
[Core] Cache some utils by @Yard1 in https://github.com/vllm-project/vllm/pull/3474
[Core] print error before deadlock by @youkaichao in https://github.com/vllm-project/vllm/pull/3459
[Doc] Add docs about OpenAI compatible server by @simon-mo in https://github.com/vllm-project/vllm/pull/3288
[BugFix] Avoid initializing CUDA too early by @njhill in https://github.com/vllm-project/vllm/pull/3487
Update dockerfile with ModelScope support by @ifsheldon in https://github.com/vllm-project/vllm/pull/3429
[Doc] minor fix to neuron-installation.rst by @jimburtoft in https://github.com/vllm-project/vllm/pull/3505
Revert "[Core] Cache some utils" by @simon-mo in https://github.com/vllm-project/vllm/pull/3507
[Doc] minor fix of spelling in amd-installation.rst by @jimburtoft in https://github.com/vllm-project/vllm/pull/3506
Use lru_cache for some environment detection utils by @simon-mo in https://github.com/vllm-project/vllm/pull/3508
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3357
[Core] Add generic typing to LRUCache by @njhill in https://github.com/vllm-project/vllm/pull/3511
[Misc] Remove cache stream and cache events by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3461
Abort when nvcc command is not found in the PATH by @AllenDou in https://github.com/vllm-project/vllm/pull/3527
Check for _is_cuda() in compute_num_jobs by @bnellnm in https://github.com/vllm-project/vllm/pull/3481
[Bugfix] Fix ROCm support in CMakeLists.txt by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/3534
[1/n] Triton sampling kernel by @Yard1 in https://github.com/vllm-project/vllm/pull/3186
[1/n][Chunked Prefill] Refactor input query shapes by @rkooo567 in https://github.com/vllm-project/vllm/pull/3236
Migrate logits computation and gather to model_runner by @esmeetu in https://github.com/vllm-project/vllm/pull/3233
[BugFix] Hot fix in setup.py for neuron build by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3537
[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3431
Fix 1D query issue from _prune_hidden_states by @rkooo567 in https://github.com/vllm-project/vllm/pull/3539
[🚀 Ready to be merged] Added support for Jais models by @grandiose-pizza in https://github.com/vllm-project/vllm/pull/3183
[Misc][Log] Add log for tokenizer length not equal to vocabulary size by @esmeetu in https://github.com/vllm-project/vllm/pull/3500
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3551
[BugFix] gemma loading after quantization or LoRA. by @taeminlee in https://github.com/vllm-project/vllm/pull/3553
[Bugfix][Model] Fix Qwen2 by @esmeetu in https://github.com/vllm-project/vllm/pull/3554
[Hardware][Neuron] Refactor neuron support by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3471
Some fixes for custom allreduce kernels by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2760
Dynamic scheduler delay to improve ITL performance by @tdoublep in https://github.com/vllm-project/vllm/pull/3279
[Core] Improve detokenization performance for prefill by @Yard1 in https://github.com/vllm-project/vllm/pull/3469
[Bugfix] use SoftLockFile instead of LockFile by @kota-iizuka in https://github.com/vllm-project/vllm/pull/3578
[Misc] Fix BLOOM copyright notice by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3591
[Misc] Bump transformers version by @ywang96 in https://github.com/vllm-project/vllm/pull/3592
[BugFix] Fix Falcon tied embeddings by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3590
[BugFix] 1D query fix for MoE models by @njhill in https://github.com/vllm-project/vllm/pull/3597
[CI] typo fix: is_hip --> is_hip() by @youkaichao in https://github.com/vllm-project/vllm/pull/3595
[CI/Build] respect the common environment variable MAX_JOBS by @youkaichao in https://github.com/vllm-project/vllm/pull/3600
[CI/Build] fix flaky test by @youkaichao in https://github.com/vllm-project/vllm/pull/3602
[BugFix] minor fix: method typo in rotary_embedding.py file, get_device() -> device by @jikunshang in https://github.com/vllm-project/vllm/pull/3604
[Bugfix] Revert "[Bugfix] use SoftLockFile instead of LockFile (#3578)" by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3599
[Model] Add starcoder2 awq support by @shaonianyr in https://github.com/vllm-project/vllm/pull/3569
[Core] Refactor Attention Take 2 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3462
[Bugfix] fix automatic prefix args and add log info by @gty111 in https://github.com/vllm-project/vllm/pull/3608
[CI] Try introducing isort. by @rkooo567 in https://github.com/vllm-project/vllm/pull/3495
[Core] Adding token ranks along with logprobs by @SwapnilDreams100 in https://github.com/vllm-project/vllm/pull/3516
feat: implement the min_tokens sampling parameter by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3124
[Bugfix] API stream returning two stops by @dylanwhawk in https://github.com/vllm-project/vllm/pull/3450
hotfix isort on logprobs ranks pr by @simon-mo in https://github.com/vllm-project/vllm/pull/3622
[Feature] Add vision language model support. by @xwjiang2010 in https://github.com/vllm-project/vllm/pull/3042
Optimize _get_ranks in Sampler by @Yard1 in https://github.com/vllm-project/vllm/pull/3623
[Misc] Include matched stop string/token in responses by @njhill in https://github.com/vllm-project/vllm/pull/2976
Enable more models to inference based on LoRA by @jeejeelee in https://github.com/vllm-project/vllm/pull/3382
[Bugfix] Fix ipv6 address parsing bug by @liiliiliil in https://github.com/vllm-project/vllm/pull/3641
[BugFix] Fix ipv4 address parsing regression by @njhill in https://github.com/vllm-project/vllm/pull/3645
[Kernel] support non-zero cuda devices in punica kernels by @jeejeelee in https://github.com/vllm-project/vllm/pull/3636
[Doc]add lora support by @jeejeelee in https://github.com/vllm-project/vllm/pull/3649
[Misc] Minor fix in KVCache type by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3652
[Core] remove cupy dependency by @youkaichao in https://github.com/vllm-project/vllm/pull/3625
[Bugfix] More faithful implementation of Gemma by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3653
[Bugfix] [Hotfix] fix nccl library name by @youkaichao in https://github.com/vllm-project/vllm/pull/3661
[Model] Add support for DBRX by @megha95 in https://github.com/vllm-project/vllm/pull/3660
[Misc] add the "download-dir" option to the latency/throughput benchmarks by @AmadeusChan in https://github.com/vllm-project/vllm/pull/3621
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark by @ywang96 in https://github.com/vllm-project/vllm/pull/3277
Add support for Cohere's Command-R model by @zeppombal in https://github.com/vllm-project/vllm/pull/3433
[Docs] Add Command-R to supported models by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3669
[Model] Fix and clean commandr by @esmeetu in https://github.com/vllm-project/vllm/pull/3671
[Model] Add support for xverse by @hxer7963 in https://github.com/vllm-project/vllm/pull/3610
[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system by @youkaichao in https://github.com/vllm-project/vllm/pull/3675
[Kernel] Add Triton MoE kernel configs for DBRX + A100 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3679
[Core] [Bugfix] Refactor block manager subsystem for better testability by @cadedaniel in https://github.com/vllm-project/vllm/pull/3492
[Model] Add support for Qwen2MoeModel by @wenyujin333 in https://github.com/vllm-project/vllm/pull/3346
[Kernel] DBRX Triton MoE kernel H100 by @ywang96 in https://github.com/vllm-project/vllm/pull/3692
[2/N] Chunked prefill data update by @rkooo567 in https://github.com/vllm-project/vllm/pull/3538
[Bugfix] Update neuron_executor.py to add optional vision_language_config. by @adamrb in https://github.com/vllm-project/vllm/pull/3695
fix benchmark format reporting in buildkite by @simon-mo in https://github.com/vllm-project/vllm/pull/3693
[CI] Add test case to run examples scripts by @simon-mo in https://github.com/vllm-project/vllm/pull/3638
[Core] Support multi-node inference(eager and cuda graph) by @esmeetu in https://github.com/vllm-project/vllm/pull/3686
[Kernel] Add MoE Triton kernel configs for A100 40GB by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3700
[Bugfix] Set enable_prefix_caching=True in prefix caching example by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3703
fix logging msg for block manager by @simon-mo in https://github.com/vllm-project/vllm/pull/3701
[Core] fix del of communicator by @youkaichao in https://github.com/vllm-project/vllm/pull/3702
[Benchmark] Change mii to use persistent deployment and support tensor parallel by @IKACE in https://github.com/vllm-project/vllm/pull/3628
bump version to v0.4.0 by @simon-mo in https://github.com/vllm-project/vllm/pull/3705
Revert "bump version to v0.4.0" by @youkaichao in https://github.com/vllm-project/vllm/pull/3708
[Test] Make model tests run again and remove --forked from pytest by @rkooo567 in https://github.com/vllm-project/vllm/pull/3631
[Misc] Minor type annotation fix by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3716
[Core][Test] move local_rank to the last arg with default value to keep api compatible by @youkaichao in https://github.com/vllm-project/vllm/pull/3711
add ccache to docker build image by @simon-mo in https://github.com/vllm-project/vllm/pull/3704
Usage Stats Collection by @yhu422 in https://github.com/vllm-project/vllm/pull/2852
[BugFix] Fix tokenizer out of vocab size by @esmeetu in https://github.com/vllm-project/vllm/pull/3685
[BugFix][Frontend] Fix completion logprobs=0 error by @esmeetu in https://github.com/vllm-project/vllm/pull/3731
[Bugfix] Command-R Max Model Length by @ywang96 in https://github.com/vllm-project/vllm/pull/3727
bump version to v0.4.0 by @simon-mo in https://github.com/vllm-project/vllm/pull/3712
[ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic by @hongxiayang in https://github.com/vllm-project/vllm/pull/3699
usage lib get version another way by @simon-mo in https://github.com/vllm-project/vllm/pull/3735
[BugFix] Use consistent logger everywhere by @njhill in https://github.com/vllm-project/vllm/pull/3738
[Core][Bugfix] cache len of tokenizer by @youkaichao in https://github.com/vllm-project/vllm/pull/3741
Fix build when nvtools is missing by @bnellnm in https://github.com/vllm-project/vllm/pull/3698
CMake build elf without PTX by @simon-mo in https://github.com/vllm-project/vllm/pull/3739

New Contributors

@cloudhan made their first contribution in https://github.com/vllm-project/vllm/pull/3104
@SageMoore made their first contribution in https://github.com/vllm-project/vllm/pull/2762
@jasonacox made their first contribution in https://github.com/vllm-project/vllm/pull/3161
@gty111 made their first contribution in https://github.com/vllm-project/vllm/pull/3171
@pian13131 made their first contribution in https://github.com/vllm-project/vllm/pull/2978
@ttbachyinsda made their first contribution in https://github.com/vllm-project/vllm/pull/3176
@wangchen615 made their first contribution in https://github.com/vllm-project/vllm/pull/2992
@chujiezheng made their first contribution in https://github.com/vllm-project/vllm/pull/3242
@TechxGenus made their first contribution in https://github.com/vllm-project/vllm/pull/3200
@mgoin made their first contribution in https://github.com/vllm-project/vllm/pull/3120
@jacobthebanana made their first contribution in https://github.com/vllm-project/vllm/pull/3263
@ElizaWszola made their first contribution in https://github.com/vllm-project/vllm/pull/3239
@DAIZHENWEI made their first contribution in https://github.com/vllm-project/vllm/pull/3153
@Sherlock113 made their first contribution in https://github.com/vllm-project/vllm/pull/3336
@br3no made their first contribution in https://github.com/vllm-project/vllm/pull/3347
@DreamTeamWangbowen made their first contribution in https://github.com/vllm-project/vllm/pull/3319
@RonanKMcGovern made their first contribution in https://github.com/vllm-project/vllm/pull/3031
@hliuca made their first contribution in https://github.com/vllm-project/vllm/pull/3259
@orsharir made their first contribution in https://github.com/vllm-project/vllm/pull/3350
@youkaichao made their first contribution in https://github.com/vllm-project/vllm/pull/3389
@tdoublep made their first contribution in https://github.com/vllm-project/vllm/pull/3396
@declark1 made their first contribution in https://github.com/vllm-project/vllm/pull/3410
@qeternity made their first contribution in https://github.com/vllm-project/vllm/pull/3414
@akhoroshev made their first contribution in https://github.com/vllm-project/vllm/pull/3376
@Dinghow made their first contribution in https://github.com/vllm-project/vllm/pull/3420
@fyabc made their first contribution in https://github.com/vllm-project/vllm/pull/3344
@laneeeee made their first contribution in https://github.com/vllm-project/vllm/pull/3421
@bnellnm made their first contribution in https://github.com/vllm-project/vllm/pull/2830
@ifsheldon made their first contribution in https://github.com/vllm-project/vllm/pull/3429
@jimburtoft made their first contribution in https://github.com/vllm-project/vllm/pull/3505
@grandiose-pizza made their first contribution in https://github.com/vllm-project/vllm/pull/3183
@taeminlee made their first contribution in https://github.com/vllm-project/vllm/pull/3553
@kota-iizuka made their first contribution in https://github.com/vllm-project/vllm/pull/3578
@shaonianyr made their first contribution in https://github.com/vllm-project/vllm/pull/3569
@SwapnilDreams100 made their first contribution in https://github.com/vllm-project/vllm/pull/3516
@tjohnson31415 made their first contribution in https://github.com/vllm-project/vllm/pull/3124
@xwjiang2010 made their first contribution in https://github.com/vllm-project/vllm/pull/3042
@liiliiliil made their first contribution in https://github.com/vllm-project/vllm/pull/3641
@AmadeusChan made their first contribution in https://github.com/vllm-project/vllm/pull/3621
@zeppombal made their first contribution in https://github.com/vllm-project/vllm/pull/3433
@hxer7963 made their first contribution in https://github.com/vllm-project/vllm/pull/3610
@wenyujin333 made their first contribution in https://github.com/vllm-project/vllm/pull/3346
@adamrb made their first contribution in https://github.com/vllm-project/vllm/pull/3695
@IKACE made their first contribution in https://github.com/vllm-project/vllm/pull/3628
@yhu422 made their first contribution in https://github.com/vllm-project/vllm/pull/2852

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.3.3...v0.4.0

相关地址：原始地址下载(tar) 下载(zip)

1、 vllm-0.4.0+cu118-cp310-cp310-manylinux1_x86_64.whl 109.84MB

2、 vllm-0.4.0+cu118-cp311-cp311-manylinux1_x86_64.whl 109.88MB

3、 vllm-0.4.0+cu118-cp38-cp38-manylinux1_x86_64.whl 109.84MB

4、 vllm-0.4.0+cu118-cp39-cp39-manylinux1_x86_64.whl 109.85MB

5、 vllm-0.4.0-cp310-cp310-manylinux1_x86_64.whl 68.94MB

6、 vllm-0.4.0-cp311-cp311-manylinux1_x86_64.whl 68.97MB

7、 vllm-0.4.0-cp38-cp38-manylinux1_x86_64.whl 68.94MB

8、 vllm-0.4.0-cp39-cp39-manylinux1_x86_64.whl 68.95MB

查看：2024-03-30发行的版本