v0.4.0
版本发布时间: 2024-03-30 09:54:27
vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)
Major changes
Models
- New models: Command+R(#3433), Qwen2 MoE(#3346), DBRX(#3660), XVerse (#3610), Jais (#3183).
- New vision language model: LLaVA (#3042)
Production features
- Automatic prefix caching (#2762, #3703) supporting long system prompt to be automatically cached across requests. Use the flag
--enable-prefix-caching
to turn it on. - Support
json_object
in OpenAI server for arbitrary JSON,--use-delay
flag to improve time to first token across many requests, andmin_tokens
to EOS suppression. - Progress in chunked prefill scheduler (#3236, #3538), and speculative decoding (#3103).
- Custom all reduce kernel has been re-enabled after more robustness fixes.
- Replaced cupy dependency due to its bugs.
Hardware
- Improved Neuron support for AWS Inferentia.
- CMake based build system for extensibility.
Ecosystem
- Extensive serving benchmark refactoring (#3277)
- Usage statistics collection (#2852)
What's Changed
- allow user chose log level by --log-level instead of fixed 'info'. by @AllenDou in https://github.com/vllm-project/vllm/pull/3109
- Reorder kv dtype check to avoid nvcc not found error on AMD platform by @cloudhan in https://github.com/vllm-project/vllm/pull/3104
- Add Automatic Prefix Caching by @SageMoore in https://github.com/vllm-project/vllm/pull/2762
- Add vLLM version info to logs and openai API server by @jasonacox in https://github.com/vllm-project/vllm/pull/3161
- [FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3158
- Make it easy to profile workers with nsight by @pcmoritz in https://github.com/vllm-project/vllm/pull/3162
- [DOC] add setup document to support neuron backend by @liangfu in https://github.com/vllm-project/vllm/pull/2777
- [Minor Fix] Remove unused code in benchmark_prefix_caching.py by @gty111 in https://github.com/vllm-project/vllm/pull/3171
- Add document for vllm paged attention kernel. by @pian13131 in https://github.com/vllm-project/vllm/pull/2978
- enable --gpu-memory-utilization in benchmark_throughput.py by @AllenDou in https://github.com/vllm-project/vllm/pull/3175
- [Minor fix] The domain dns.google may cause a socket.gaierror exception by @ttbachyinsda in https://github.com/vllm-project/vllm/pull/3176
- Push logprob generation to LLMEngine by @Yard1 in https://github.com/vllm-project/vllm/pull/3065
- Add health check, make async Engine more robust by @Yard1 in https://github.com/vllm-project/vllm/pull/3015
- Fix the openai benchmarking requests to work with latest OpenAI apis by @wangchen615 in https://github.com/vllm-project/vllm/pull/2992
- [ROCm] enable cupy in order to enable cudagraph mode for AMD GPUs by @hongxiayang in https://github.com/vllm-project/vllm/pull/3123
- Store
eos_token_id
inSequence
for easy access by @njhill in https://github.com/vllm-project/vllm/pull/3166 - [Fix] Avoid pickling entire LLMEngine for Ray workers by @njhill in https://github.com/vllm-project/vllm/pull/3207
- [Tests] Add block manager and scheduler tests by @rkooo567 in https://github.com/vllm-project/vllm/pull/3108
- [Testing] Fix core tests by @cadedaniel in https://github.com/vllm-project/vllm/pull/3224
- A simple addition of
dynamic_ncols=True
by @chujiezheng in https://github.com/vllm-project/vllm/pull/3242 - Add GPTQ support for Gemma by @TechxGenus in https://github.com/vllm-project/vllm/pull/3200
- Update requirements-dev.txt to include package for benchmarking scripts. by @wangchen615 in https://github.com/vllm-project/vllm/pull/3181
- Separate attention backends by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3005
- Measure model memory usage by @mgoin in https://github.com/vllm-project/vllm/pull/3120
- Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) by @jacobthebanana in https://github.com/vllm-project/vllm/pull/3263
- Fix auto prefix bug by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3239
- Connect engine healthcheck to openai server by @njhill in https://github.com/vllm-project/vllm/pull/3260
- Feature add lora support for Qwen2 by @whyiug in https://github.com/vllm-project/vllm/pull/3177
- [Minor Fix] Fix comments in benchmark_serving by @gty111 in https://github.com/vllm-project/vllm/pull/3252
- [Docs] Fix Unmocked Imports by @ywang96 in https://github.com/vllm-project/vllm/pull/3275
- [FIX] Make
flash_attn
optional by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3269 - Move model filelocks from
/tmp/
to~/.cache/vllm/locks/
dir by @mgoin in https://github.com/vllm-project/vllm/pull/3241 - [FIX] Fix prefix test error on main by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3286
- [Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling by @cadedaniel in https://github.com/vllm-project/vllm/pull/3103
- Enhance lora tests with more layer and rank variations by @tterrysun in https://github.com/vllm-project/vllm/pull/3243
- [ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA by @dllehr-amd in https://github.com/vllm-project/vllm/pull/3262
- [BugFix] Fix get tokenizer when using ray by @esmeetu in https://github.com/vllm-project/vllm/pull/3301
- [Fix] Fix best_of behavior when n=1 by @njhill in https://github.com/vllm-project/vllm/pull/3298
- Re-enable the 80 char line width limit by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3305
- [docs] Add LoRA support information for models by @pcmoritz in https://github.com/vllm-project/vllm/pull/3299
- Add distributed model executor abstraction by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3191
- [ROCm] Fix warp and lane calculation in blockReduceSum by @kliuae in https://github.com/vllm-project/vllm/pull/3321
- Support Mistral Model Inference with transformers-neuronx by @DAIZHENWEI in https://github.com/vllm-project/vllm/pull/3153
- docs: Add BentoML deployment doc by @Sherlock113 in https://github.com/vllm-project/vllm/pull/3336
- Fixes #1556 double free by @br3no in https://github.com/vllm-project/vllm/pull/3347
- Add kernel for GeGLU with approximate GELU by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3337
- [Fix] fix quantization arg when using marlin by @DreamTeamWangbowen in https://github.com/vllm-project/vllm/pull/3319
- add hf_transfer to requirements.txt by @RonanKMcGovern in https://github.com/vllm-project/vllm/pull/3031
- fix bias in if, ambiguous by @hliuca in https://github.com/vllm-project/vllm/pull/3259
- [Minor Fix] Use cupy-cuda11x in CUDA 11.8 build by @chenxu2048 in https://github.com/vllm-project/vllm/pull/3256
- Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. by @orsharir in https://github.com/vllm-project/vllm/pull/3350
- Add batched RoPE kernel by @tterrysun in https://github.com/vllm-project/vllm/pull/3095
- Fix lint by @Yard1 in https://github.com/vllm-project/vllm/pull/3388
- [FIX] Simpler fix for async engine running on ray by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3371
- [Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion by @simon-mo in https://github.com/vllm-project/vllm/pull/3383
- allow user to chose which vllm's merics to display in grafana by @AllenDou in https://github.com/vllm-project/vllm/pull/3393
- [Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 by @youkaichao in https://github.com/vllm-project/vllm/pull/3389
- Install
flash_attn
in Docker image by @tdoublep in https://github.com/vllm-project/vllm/pull/3396 - Add args for mTLS support by @declark1 in https://github.com/vllm-project/vllm/pull/3410
- [issue templates] add some issue templates by @youkaichao in https://github.com/vllm-project/vllm/pull/3412
- Fix assertion failure in Qwen 1.5 with prefix caching enabled by @chenxu2048 in https://github.com/vllm-project/vllm/pull/3373
- fix marlin config repr by @qeternity in https://github.com/vllm-project/vllm/pull/3414
- Feature: dynamic shared mem moe_align_block_size_kernel by @akhoroshev in https://github.com/vllm-project/vllm/pull/3376
- [Misc] add HOST_IP env var by @youkaichao in https://github.com/vllm-project/vllm/pull/3419
- Add chat templates for Falcon by @Dinghow in https://github.com/vllm-project/vllm/pull/3420
- Add chat templates for ChatGLM by @Dinghow in https://github.com/vllm-project/vllm/pull/3418
- Fix
dist.broadcast
stall without group argument by @GindaChen in https://github.com/vllm-project/vllm/pull/3408 - Fix tie_word_embeddings for Qwen2. by @fyabc in https://github.com/vllm-project/vllm/pull/3344
- [Fix] Add args for mTLS support by @declark1 in https://github.com/vllm-project/vllm/pull/3430
- Fixes the misuse/mixuse of time.time()/time.monotonic() by @sighingnow in https://github.com/vllm-project/vllm/pull/3220
- [Misc] add error message in non linux platform by @youkaichao in https://github.com/vllm-project/vllm/pull/3438
- Fix issue templates by @hmellor in https://github.com/vllm-project/vllm/pull/3436
- fix document error for value and v_vec illustration by @laneeeee in https://github.com/vllm-project/vllm/pull/3421
- Asynchronous tokenization by @Yard1 in https://github.com/vllm-project/vllm/pull/2879
- Removed Extraneous Print Message From OAI Server by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3440
- [Misc] PR templates by @youkaichao in https://github.com/vllm-project/vllm/pull/3413
- Fixes the incorrect argument in the prefix-prefill test cases by @sighingnow in https://github.com/vllm-project/vllm/pull/3246
- Replace
lstrip()
withremoveprefix()
to fix Ruff linter warning by @ronensc in https://github.com/vllm-project/vllm/pull/2958 - Fix Baichuan chat template by @Dinghow in https://github.com/vllm-project/vllm/pull/3340
- [Misc] fix line length for entire codebase by @simon-mo in https://github.com/vllm-project/vllm/pull/3444
- Support arbitrary json_object in OpenAI and Context Free Grammar by @simon-mo in https://github.com/vllm-project/vllm/pull/3211
- Fix setup.py neuron-ls issue by @simon-mo in https://github.com/vllm-project/vllm/pull/2671
- [Misc] Define from_dict and to_dict in InputMetadata by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3452
- [CI] Shard tests for LoRA and Kernels to speed up by @simon-mo in https://github.com/vllm-project/vllm/pull/3445
- [Bugfix] Make moe_align_block_size AMD-compatible by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3470
- CI: Add ROCm Docker Build by @simon-mo in https://github.com/vllm-project/vllm/pull/2886
- [Testing] Add test_config.py to CI by @cadedaniel in https://github.com/vllm-project/vllm/pull/3437
- [CI/Build] Fix Bad Import In Test by @robertgshaw2-neuralmagic in https://github.com/vllm-project/vllm/pull/3473
- [Misc] Fix PR Template by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3478
- Cmake based build system by @bnellnm in https://github.com/vllm-project/vllm/pull/2830
- [Core] Zero-copy asdict for InputMetadata by @Yard1 in https://github.com/vllm-project/vllm/pull/3475
- [Misc] Update README for the Third vLLM Meetup by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3479
- [Core] Cache some utils by @Yard1 in https://github.com/vllm-project/vllm/pull/3474
- [Core] print error before deadlock by @youkaichao in https://github.com/vllm-project/vllm/pull/3459
- [Doc] Add docs about OpenAI compatible server by @simon-mo in https://github.com/vllm-project/vllm/pull/3288
- [BugFix] Avoid initializing CUDA too early by @njhill in https://github.com/vllm-project/vllm/pull/3487
- Update dockerfile with ModelScope support by @ifsheldon in https://github.com/vllm-project/vllm/pull/3429
- [Doc] minor fix to neuron-installation.rst by @jimburtoft in https://github.com/vllm-project/vllm/pull/3505
- Revert "[Core] Cache some utils" by @simon-mo in https://github.com/vllm-project/vllm/pull/3507
- [Doc] minor fix of spelling in amd-installation.rst by @jimburtoft in https://github.com/vllm-project/vllm/pull/3506
- Use lru_cache for some environment detection utils by @simon-mo in https://github.com/vllm-project/vllm/pull/3508
- [PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3357
- [Core] Add generic typing to
LRUCache
by @njhill in https://github.com/vllm-project/vllm/pull/3511 - [Misc] Remove cache stream and cache events by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3461
- Abort when nvcc command is not found in the PATH by @AllenDou in https://github.com/vllm-project/vllm/pull/3527
- Check for _is_cuda() in compute_num_jobs by @bnellnm in https://github.com/vllm-project/vllm/pull/3481
- [Bugfix] Fix ROCm support in CMakeLists.txt by @jamestwhedbee in https://github.com/vllm-project/vllm/pull/3534
- [1/n] Triton sampling kernel by @Yard1 in https://github.com/vllm-project/vllm/pull/3186
- [1/n][Chunked Prefill] Refactor input query shapes by @rkooo567 in https://github.com/vllm-project/vllm/pull/3236
- Migrate
logits
computation and gather tomodel_runner
by @esmeetu in https://github.com/vllm-project/vllm/pull/3233 - [BugFix] Hot fix in setup.py for neuron build by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3537
- [PREFIX CACHING FOLLOW UP] OrderedDict-based evictor by @ElizaWszola in https://github.com/vllm-project/vllm/pull/3431
- Fix 1D query issue from
_prune_hidden_states
by @rkooo567 in https://github.com/vllm-project/vllm/pull/3539 - [🚀 Ready to be merged] Added support for Jais models by @grandiose-pizza in https://github.com/vllm-project/vllm/pull/3183
- [Misc][Log] Add log for tokenizer length not equal to vocabulary size by @esmeetu in https://github.com/vllm-project/vllm/pull/3500
- [Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3551
- [BugFix] gemma loading after quantization or LoRA. by @taeminlee in https://github.com/vllm-project/vllm/pull/3553
- [Bugfix][Model] Fix Qwen2 by @esmeetu in https://github.com/vllm-project/vllm/pull/3554
- [Hardware][Neuron] Refactor neuron support by @zhuohan123 in https://github.com/vllm-project/vllm/pull/3471
- Some fixes for custom allreduce kernels by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2760
- Dynamic scheduler delay to improve ITL performance by @tdoublep in https://github.com/vllm-project/vllm/pull/3279
- [Core] Improve detokenization performance for prefill by @Yard1 in https://github.com/vllm-project/vllm/pull/3469
- [Bugfix] use SoftLockFile instead of LockFile by @kota-iizuka in https://github.com/vllm-project/vllm/pull/3578
- [Misc] Fix BLOOM copyright notice by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3591
- [Misc] Bump transformers version by @ywang96 in https://github.com/vllm-project/vllm/pull/3592
- [BugFix] Fix Falcon tied embeddings by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3590
- [BugFix] 1D query fix for MoE models by @njhill in https://github.com/vllm-project/vllm/pull/3597
- [CI] typo fix: is_hip --> is_hip() by @youkaichao in https://github.com/vllm-project/vllm/pull/3595
- [CI/Build] respect the common environment variable MAX_JOBS by @youkaichao in https://github.com/vllm-project/vllm/pull/3600
- [CI/Build] fix flaky test by @youkaichao in https://github.com/vllm-project/vllm/pull/3602
- [BugFix] minor fix: method typo in
rotary_embedding.py
file, get_device() -> device by @jikunshang in https://github.com/vllm-project/vllm/pull/3604 - [Bugfix] Revert "[Bugfix] use SoftLockFile instead of LockFile (#3578)" by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3599
- [Model] Add starcoder2 awq support by @shaonianyr in https://github.com/vllm-project/vllm/pull/3569
- [Core] Refactor Attention Take 2 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3462
- [Bugfix] fix automatic prefix args and add log info by @gty111 in https://github.com/vllm-project/vllm/pull/3608
- [CI] Try introducing isort. by @rkooo567 in https://github.com/vllm-project/vllm/pull/3495
- [Core] Adding token ranks along with logprobs by @SwapnilDreams100 in https://github.com/vllm-project/vllm/pull/3516
- feat: implement the min_tokens sampling parameter by @tjohnson31415 in https://github.com/vllm-project/vllm/pull/3124
- [Bugfix] API stream returning two stops by @dylanwhawk in https://github.com/vllm-project/vllm/pull/3450
- hotfix isort on logprobs ranks pr by @simon-mo in https://github.com/vllm-project/vllm/pull/3622
- [Feature] Add vision language model support. by @xwjiang2010 in https://github.com/vllm-project/vllm/pull/3042
- Optimize
_get_ranks
in Sampler by @Yard1 in https://github.com/vllm-project/vllm/pull/3623 - [Misc] Include matched stop string/token in responses by @njhill in https://github.com/vllm-project/vllm/pull/2976
- Enable more models to inference based on LoRA by @jeejeelee in https://github.com/vllm-project/vllm/pull/3382
- [Bugfix] Fix ipv6 address parsing bug by @liiliiliil in https://github.com/vllm-project/vllm/pull/3641
- [BugFix] Fix ipv4 address parsing regression by @njhill in https://github.com/vllm-project/vllm/pull/3645
- [Kernel] support non-zero cuda devices in punica kernels by @jeejeelee in https://github.com/vllm-project/vllm/pull/3636
- [Doc]add lora support by @jeejeelee in https://github.com/vllm-project/vllm/pull/3649
- [Misc] Minor fix in KVCache type by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3652
- [Core] remove cupy dependency by @youkaichao in https://github.com/vllm-project/vllm/pull/3625
- [Bugfix] More faithful implementation of Gemma by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3653
- [Bugfix] [Hotfix] fix nccl library name by @youkaichao in https://github.com/vllm-project/vllm/pull/3661
- [Model] Add support for DBRX by @megha95 in https://github.com/vllm-project/vllm/pull/3660
- [Misc] add the "download-dir" option to the latency/throughput benchmarks by @AmadeusChan in https://github.com/vllm-project/vllm/pull/3621
- feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark by @ywang96 in https://github.com/vllm-project/vllm/pull/3277
- Add support for Cohere's Command-R model by @zeppombal in https://github.com/vllm-project/vllm/pull/3433
- [Docs] Add Command-R to supported models by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3669
- [Model] Fix and clean commandr by @esmeetu in https://github.com/vllm-project/vllm/pull/3671
- [Model] Add support for xverse by @hxer7963 in https://github.com/vllm-project/vllm/pull/3610
- [CI/Build] update default number of jobs and nvcc threads to avoid overloading the system by @youkaichao in https://github.com/vllm-project/vllm/pull/3675
- [Kernel] Add Triton MoE kernel configs for DBRX + A100 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3679
- [Core] [Bugfix] Refactor block manager subsystem for better testability by @cadedaniel in https://github.com/vllm-project/vllm/pull/3492
- [Model] Add support for Qwen2MoeModel by @wenyujin333 in https://github.com/vllm-project/vllm/pull/3346
- [Kernel] DBRX Triton MoE kernel H100 by @ywang96 in https://github.com/vllm-project/vllm/pull/3692
- [2/N] Chunked prefill data update by @rkooo567 in https://github.com/vllm-project/vllm/pull/3538
- [Bugfix] Update neuron_executor.py to add optional vision_language_config. by @adamrb in https://github.com/vllm-project/vllm/pull/3695
- fix benchmark format reporting in buildkite by @simon-mo in https://github.com/vllm-project/vllm/pull/3693
- [CI] Add test case to run examples scripts by @simon-mo in https://github.com/vllm-project/vllm/pull/3638
- [Core] Support multi-node inference(eager and cuda graph) by @esmeetu in https://github.com/vllm-project/vllm/pull/3686
- [Kernel] Add MoE Triton kernel configs for A100 40GB by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3700
- [Bugfix] Set enable_prefix_caching=True in prefix caching example by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3703
- fix logging msg for block manager by @simon-mo in https://github.com/vllm-project/vllm/pull/3701
- [Core] fix del of communicator by @youkaichao in https://github.com/vllm-project/vllm/pull/3702
- [Benchmark] Change mii to use persistent deployment and support tensor parallel by @IKACE in https://github.com/vllm-project/vllm/pull/3628
- bump version to v0.4.0 by @simon-mo in https://github.com/vllm-project/vllm/pull/3705
- Revert "bump version to v0.4.0" by @youkaichao in https://github.com/vllm-project/vllm/pull/3708
- [Test] Make model tests run again and remove --forked from pytest by @rkooo567 in https://github.com/vllm-project/vllm/pull/3631
- [Misc] Minor type annotation fix by @WoosukKwon in https://github.com/vllm-project/vllm/pull/3716
- [Core][Test] move local_rank to the last arg with default value to keep api compatible by @youkaichao in https://github.com/vllm-project/vllm/pull/3711
- add ccache to docker build image by @simon-mo in https://github.com/vllm-project/vllm/pull/3704
- Usage Stats Collection by @yhu422 in https://github.com/vllm-project/vllm/pull/2852
- [BugFix] Fix tokenizer out of vocab size by @esmeetu in https://github.com/vllm-project/vllm/pull/3685
- [BugFix][Frontend] Fix completion logprobs=0 error by @esmeetu in https://github.com/vllm-project/vllm/pull/3731
- [Bugfix] Command-R Max Model Length by @ywang96 in https://github.com/vllm-project/vllm/pull/3727
- bump version to v0.4.0 by @simon-mo in https://github.com/vllm-project/vllm/pull/3712
- [ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic by @hongxiayang in https://github.com/vllm-project/vllm/pull/3699
- usage lib get version another way by @simon-mo in https://github.com/vllm-project/vllm/pull/3735
- [BugFix] Use consistent logger everywhere by @njhill in https://github.com/vllm-project/vllm/pull/3738
- [Core][Bugfix] cache len of tokenizer by @youkaichao in https://github.com/vllm-project/vllm/pull/3741
- Fix build when nvtools is missing by @bnellnm in https://github.com/vllm-project/vllm/pull/3698
- CMake build elf without PTX by @simon-mo in https://github.com/vllm-project/vllm/pull/3739
New Contributors
- @cloudhan made their first contribution in https://github.com/vllm-project/vllm/pull/3104
- @SageMoore made their first contribution in https://github.com/vllm-project/vllm/pull/2762
- @jasonacox made their first contribution in https://github.com/vllm-project/vllm/pull/3161
- @gty111 made their first contribution in https://github.com/vllm-project/vllm/pull/3171
- @pian13131 made their first contribution in https://github.com/vllm-project/vllm/pull/2978
- @ttbachyinsda made their first contribution in https://github.com/vllm-project/vllm/pull/3176
- @wangchen615 made their first contribution in https://github.com/vllm-project/vllm/pull/2992
- @chujiezheng made their first contribution in https://github.com/vllm-project/vllm/pull/3242
- @TechxGenus made their first contribution in https://github.com/vllm-project/vllm/pull/3200
- @mgoin made their first contribution in https://github.com/vllm-project/vllm/pull/3120
- @jacobthebanana made their first contribution in https://github.com/vllm-project/vllm/pull/3263
- @ElizaWszola made their first contribution in https://github.com/vllm-project/vllm/pull/3239
- @DAIZHENWEI made their first contribution in https://github.com/vllm-project/vllm/pull/3153
- @Sherlock113 made their first contribution in https://github.com/vllm-project/vllm/pull/3336
- @br3no made their first contribution in https://github.com/vllm-project/vllm/pull/3347
- @DreamTeamWangbowen made their first contribution in https://github.com/vllm-project/vllm/pull/3319
- @RonanKMcGovern made their first contribution in https://github.com/vllm-project/vllm/pull/3031
- @hliuca made their first contribution in https://github.com/vllm-project/vllm/pull/3259
- @orsharir made their first contribution in https://github.com/vllm-project/vllm/pull/3350
- @youkaichao made their first contribution in https://github.com/vllm-project/vllm/pull/3389
- @tdoublep made their first contribution in https://github.com/vllm-project/vllm/pull/3396
- @declark1 made their first contribution in https://github.com/vllm-project/vllm/pull/3410
- @qeternity made their first contribution in https://github.com/vllm-project/vllm/pull/3414
- @akhoroshev made their first contribution in https://github.com/vllm-project/vllm/pull/3376
- @Dinghow made their first contribution in https://github.com/vllm-project/vllm/pull/3420
- @fyabc made their first contribution in https://github.com/vllm-project/vllm/pull/3344
- @laneeeee made their first contribution in https://github.com/vllm-project/vllm/pull/3421
- @bnellnm made their first contribution in https://github.com/vllm-project/vllm/pull/2830
- @ifsheldon made their first contribution in https://github.com/vllm-project/vllm/pull/3429
- @jimburtoft made their first contribution in https://github.com/vllm-project/vllm/pull/3505
- @grandiose-pizza made their first contribution in https://github.com/vllm-project/vllm/pull/3183
- @taeminlee made their first contribution in https://github.com/vllm-project/vllm/pull/3553
- @kota-iizuka made their first contribution in https://github.com/vllm-project/vllm/pull/3578
- @shaonianyr made their first contribution in https://github.com/vllm-project/vllm/pull/3569
- @SwapnilDreams100 made their first contribution in https://github.com/vllm-project/vllm/pull/3516
- @tjohnson31415 made their first contribution in https://github.com/vllm-project/vllm/pull/3124
- @xwjiang2010 made their first contribution in https://github.com/vllm-project/vllm/pull/3042
- @liiliiliil made their first contribution in https://github.com/vllm-project/vllm/pull/3641
- @AmadeusChan made their first contribution in https://github.com/vllm-project/vllm/pull/3621
- @zeppombal made their first contribution in https://github.com/vllm-project/vllm/pull/3433
- @hxer7963 made their first contribution in https://github.com/vllm-project/vllm/pull/3610
- @wenyujin333 made their first contribution in https://github.com/vllm-project/vllm/pull/3346
- @adamrb made their first contribution in https://github.com/vllm-project/vllm/pull/3695
- @IKACE made their first contribution in https://github.com/vllm-project/vllm/pull/3628
- @yhu422 made their first contribution in https://github.com/vllm-project/vllm/pull/2852
Full Changelog: https://github.com/vllm-project/vllm/compare/v0.3.3...v0.4.0
1、 vllm-0.4.0+cu118-cp310-cp310-manylinux1_x86_64.whl 109.84MB
2、 vllm-0.4.0+cu118-cp311-cp311-manylinux1_x86_64.whl 109.88MB
3、 vllm-0.4.0+cu118-cp38-cp38-manylinux1_x86_64.whl 109.84MB
4、 vllm-0.4.0+cu118-cp39-cp39-manylinux1_x86_64.whl 109.85MB
5、 vllm-0.4.0-cp310-cp310-manylinux1_x86_64.whl 68.94MB
6、 vllm-0.4.0-cp311-cp311-manylinux1_x86_64.whl 68.97MB
7、 vllm-0.4.0-cp38-cp38-manylinux1_x86_64.whl 68.94MB
8、 vllm-0.4.0-cp39-cp39-manylinux1_x86_64.whl 68.95MB