v0.3.0

vllm-project/vllm

版本发布时间: 2024-01-31 16:07:57

vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)

Major Changes

Experimental multi-lora support
Experimental prefix caching support
FP8 KV Cache support
Optimized MoE performance and Deepseek MoE support
CI tested PRs
Support batch completion in server

What's Changed

Miner fix of type hint by @beginlner in https://github.com/vllm-project/vllm/pull/2340
Build docker image with shared objects from "build" step by @payoto in https://github.com/vllm-project/vllm/pull/2237
Ensure metrics are logged regardless of requests by @ichernev in https://github.com/vllm-project/vllm/pull/2347
Changed scheduler to use deques instead of lists by @NadavShmayo in https://github.com/vllm-project/vllm/pull/2290
Fix eager mode performance by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2377
[Minor] Remove unused code in attention by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2384
Add baichuan chat template jinjia file by @EvilPsyCHo in https://github.com/vllm-project/vllm/pull/2390
[Speculative decoding 1/9] Optimized rejection sampler by @cadedaniel in https://github.com/vllm-project/vllm/pull/2336
Fix ipv4 ipv6 dualstack by @yunfeng-scale in https://github.com/vllm-project/vllm/pull/2408
[Minor] Rename phi_1_5 to phi by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2385
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine by @litone01 in https://github.com/vllm-project/vllm/pull/1011
[Minor] Fix the format in quick start guide related to Model Scope by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2425
Add gradio chatbot for openai webserver by @arkohut in https://github.com/vllm-project/vllm/pull/2307
[BUG] RuntimeError: deque mutated during iteration in abort_seq_group by @chenxu2048 in https://github.com/vllm-project/vllm/pull/2371
Allow setting fastapi root_path argument by @chiragjn in https://github.com/vllm-project/vllm/pull/2341
Address Phi modeling update 2 by @huiwy in https://github.com/vllm-project/vllm/pull/2428
Update a more user-friendly error message, offering more considerate advice for beginners, when using V100 GPU #1901 by @chuanzhubin in https://github.com/vllm-project/vllm/pull/2374
Update quickstart.rst with small clarifying change (fix typo) by @nautsimon in https://github.com/vllm-project/vllm/pull/2369
Aligning top_p and top_k Sampling by @chenxu2048 in https://github.com/vllm-project/vllm/pull/1885
[Minor] Fix err msg by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2431
[Minor] Optimize cuda graph memory usage by @esmeetu in https://github.com/vllm-project/vllm/pull/2437
[CI] Add Buildkite by @simon-mo in https://github.com/vllm-project/vllm/pull/2355
Announce the second vLLM meetup by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2444
Allow buildkite to retry build on agent lost by @simon-mo in https://github.com/vllm-project/vllm/pull/2446
Fix weigit loading for GQA with TP by @zhangch9 in https://github.com/vllm-project/vllm/pull/2379
CI: make sure benchmark script exit on error by @simon-mo in https://github.com/vllm-project/vllm/pull/2449
ci: retry on build failure as well by @simon-mo in https://github.com/vllm-project/vllm/pull/2457
Add StableLM3B model by @ita9naiwa in https://github.com/vllm-project/vllm/pull/2372
OpenAI refactoring by @FlorianJoncour in https://github.com/vllm-project/vllm/pull/2360
[Experimental] Prefix Caching Support by @caoshiyi in https://github.com/vllm-project/vllm/pull/1669
fix stablelm.py tensor-parallel-size bug by @YingchaoX in https://github.com/vllm-project/vllm/pull/2482
Minor fix in prefill cache example by @JasonZhu1313 in https://github.com/vllm-project/vllm/pull/2494
fix: fix some args desc by @zspo in https://github.com/vllm-project/vllm/pull/2487
[Neuron] Add an option to build with neuron by @liangfu in https://github.com/vllm-project/vllm/pull/2065
Don't download both safetensor and bin files. by @NikolaBorisov in https://github.com/vllm-project/vllm/pull/2480
[BugFix] Fix abort_seq_group by @beginlner in https://github.com/vllm-project/vllm/pull/2463
refactor completion api for readability by @simon-mo in https://github.com/vllm-project/vllm/pull/2499
Support OpenAI API server in benchmark_serving.py by @hmellor in https://github.com/vllm-project/vllm/pull/2172
Simplify broadcast logic for control messages by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2501
[Bugfix] fix load local safetensors model by @esmeetu in https://github.com/vllm-project/vllm/pull/2512
Add benchmark serving to CI by @simon-mo in https://github.com/vllm-project/vllm/pull/2505
Add group as an argument in broadcast ops by @GindaChen in https://github.com/vllm-project/vllm/pull/2522
[Fix] Keep scheduler.running as deque by @njhill in https://github.com/vllm-project/vllm/pull/2523
migrate pydantic from v1 to v2 by @joennlae in https://github.com/vllm-project/vllm/pull/2531
[Speculative decoding 2/9] Multi-step worker for draft model by @cadedaniel in https://github.com/vllm-project/vllm/pull/2424
Fix "Port could not be cast to integer value as " by @pcmoritz in https://github.com/vllm-project/vllm/pull/2545
Add qwen2 by @JustinLin610 in https://github.com/vllm-project/vllm/pull/2495
Fix progress bar and allow HTTPS in benchmark_serving.py by @hmellor in https://github.com/vllm-project/vllm/pull/2552
Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py by @JasonZhu1313 in https://github.com/vllm-project/vllm/pull/2553
[Feature] Simple API token authentication by @taisazero in https://github.com/vllm-project/vllm/pull/1106
Add multi-LoRA support by @Yard1 in https://github.com/vllm-project/vllm/pull/1804
lint: format all python file instead of just source code by @simon-mo in https://github.com/vllm-project/vllm/pull/2567
[Bugfix] fix crash if max_tokens=None by @NikolaBorisov in https://github.com/vllm-project/vllm/pull/2570
Added include_stop_str_in_output and length_penalty parameters to OpenAI API by @galatolofederico in https://github.com/vllm-project/vllm/pull/2562
[Doc] Fix the syntax error in the doc of supported_models. by @keli-wen in https://github.com/vllm-project/vllm/pull/2584
Support Batch Completion in Server by @simon-mo in https://github.com/vllm-project/vllm/pull/2529
fix names and license by @JustinLin610 in https://github.com/vllm-project/vllm/pull/2589
[Fix] Use a correct device when creating OptionalCUDAGuard by @sh1ng in https://github.com/vllm-project/vllm/pull/2583
[ROCm] add support to ROCm 6.0 and MI300 by @hongxiayang in https://github.com/vllm-project/vllm/pull/2274
Support for Stable LM 2 by @dakotamahan-stability in https://github.com/vllm-project/vllm/pull/2598
Don't build punica kernels by default by @pcmoritz in https://github.com/vllm-project/vllm/pull/2605
AWQ: Up to 2.66x higher throughput by @casper-hansen in https://github.com/vllm-project/vllm/pull/2566
Use head_dim in config if exists by @xiangxu-google in https://github.com/vllm-project/vllm/pull/2622
Custom all reduce kernels by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2192
[Minor] Fix warning on Ray dependencies by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2630
Speed up Punica compilation by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2632
Small async_llm_engine refactor by @andoorve in https://github.com/vllm-project/vllm/pull/2618
Update Ray version requirements by @simon-mo in https://github.com/vllm-project/vllm/pull/2636
Support FP8-E5M2 KV Cache by @zhaoyang-star in https://github.com/vllm-project/vllm/pull/2279
Fix error when tp > 1 by @zhaoyang-star in https://github.com/vllm-project/vllm/pull/2644
No repeated IPC open by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2642
ROCm: Allow setting compilation target by @rlrs in https://github.com/vllm-project/vllm/pull/2581
DeepseekMoE support with Fused MoE kernel by @zwd003 in https://github.com/vllm-project/vllm/pull/2453
Fused MOE for Mixtral by @pcmoritz in https://github.com/vllm-project/vllm/pull/2542
Fix 'Actor methods cannot be called directly' when using --engine-use-ray by @HermitSun in https://github.com/vllm-project/vllm/pull/2664
Add swap_blocks unit tests by @sh1ng in https://github.com/vllm-project/vllm/pull/2616
Fix a small typo (tenosr -> tensor) by @pcmoritz in https://github.com/vllm-project/vllm/pull/2672
[Minor] Fix false warning when TP=1 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2674
Add quantized mixtral support by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2673
Bump up version to v0.3.0 by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2656

New Contributors

@payoto made their first contribution in https://github.com/vllm-project/vllm/pull/2237
@NadavShmayo made their first contribution in https://github.com/vllm-project/vllm/pull/2290
@EvilPsyCHo made their first contribution in https://github.com/vllm-project/vllm/pull/2390
@litone01 made their first contribution in https://github.com/vllm-project/vllm/pull/1011
@arkohut made their first contribution in https://github.com/vllm-project/vllm/pull/2307
@chiragjn made their first contribution in https://github.com/vllm-project/vllm/pull/2341
@huiwy made their first contribution in https://github.com/vllm-project/vllm/pull/2428
@chuanzhubin made their first contribution in https://github.com/vllm-project/vllm/pull/2374
@nautsimon made their first contribution in https://github.com/vllm-project/vllm/pull/2369
@zhangch9 made their first contribution in https://github.com/vllm-project/vllm/pull/2379
@ita9naiwa made their first contribution in https://github.com/vllm-project/vllm/pull/2372
@caoshiyi made their first contribution in https://github.com/vllm-project/vllm/pull/1669
@YingchaoX made their first contribution in https://github.com/vllm-project/vllm/pull/2482
@JasonZhu1313 made their first contribution in https://github.com/vllm-project/vllm/pull/2494
@zspo made their first contribution in https://github.com/vllm-project/vllm/pull/2487
@liangfu made their first contribution in https://github.com/vllm-project/vllm/pull/2065
@NikolaBorisov made their first contribution in https://github.com/vllm-project/vllm/pull/2480
@GindaChen made their first contribution in https://github.com/vllm-project/vllm/pull/2522
@njhill made their first contribution in https://github.com/vllm-project/vllm/pull/2523
@joennlae made their first contribution in https://github.com/vllm-project/vllm/pull/2531
@pcmoritz made their first contribution in https://github.com/vllm-project/vllm/pull/2545
@JustinLin610 made their first contribution in https://github.com/vllm-project/vllm/pull/2495
@taisazero made their first contribution in https://github.com/vllm-project/vllm/pull/1106
@galatolofederico made their first contribution in https://github.com/vllm-project/vllm/pull/2562
@keli-wen made their first contribution in https://github.com/vllm-project/vllm/pull/2584
@sh1ng made their first contribution in https://github.com/vllm-project/vllm/pull/2583
@hongxiayang made their first contribution in https://github.com/vllm-project/vllm/pull/2274
@dakotamahan-stability made their first contribution in https://github.com/vllm-project/vllm/pull/2598
@xiangxu-google made their first contribution in https://github.com/vllm-project/vllm/pull/2622
@andoorve made their first contribution in https://github.com/vllm-project/vllm/pull/2618
@rlrs made their first contribution in https://github.com/vllm-project/vllm/pull/2581
@zwd003 made their first contribution in https://github.com/vllm-project/vllm/pull/2453

Full Changelog: https://github.com/vllm-project/vllm/compare/v0.2.7...v0.3.0

相关地址：原始地址下载(tar) 下载(zip)

1、 vllm-0.3.0+cu118-cp310-cp310-manylinux1_x86_64.whl 36.04MB

2、 vllm-0.3.0+cu118-cp311-cp311-manylinux1_x86_64.whl 36.06MB

3、 vllm-0.3.0+cu118-cp38-cp38-manylinux1_x86_64.whl 36.04MB

4、 vllm-0.3.0+cu118-cp39-cp39-manylinux1_x86_64.whl 36.04MB

5、 vllm-0.3.0-cp310-cp310-manylinux1_x86_64.whl 36.27MB

6、 vllm-0.3.0-cp311-cp311-manylinux1_x86_64.whl 36.29MB

7、 vllm-0.3.0-cp38-cp38-manylinux1_x86_64.whl 36.27MB

8、 vllm-0.3.0-cp39-cp39-manylinux1_x86_64.whl 36.26MB

查看：2024-01-31发行的版本