v0.3.0
版本发布时间: 2024-01-31 16:07:57
vllm-project/vllm最新发布版本:v0.4.1(2024-04-24 10:28:08)
Major Changes
- Experimental multi-lora support
- Experimental prefix caching support
- FP8 KV Cache support
- Optimized MoE performance and Deepseek MoE support
- CI tested PRs
- Support batch completion in server
What's Changed
- Miner fix of type hint by @beginlner in https://github.com/vllm-project/vllm/pull/2340
- Build docker image with shared objects from "build" step by @payoto in https://github.com/vllm-project/vllm/pull/2237
- Ensure metrics are logged regardless of requests by @ichernev in https://github.com/vllm-project/vllm/pull/2347
- Changed scheduler to use deques instead of lists by @NadavShmayo in https://github.com/vllm-project/vllm/pull/2290
- Fix eager mode performance by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2377
- [Minor] Remove unused code in attention by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2384
- Add baichuan chat template jinjia file by @EvilPsyCHo in https://github.com/vllm-project/vllm/pull/2390
- [Speculative decoding 1/9] Optimized rejection sampler by @cadedaniel in https://github.com/vllm-project/vllm/pull/2336
- Fix ipv4 ipv6 dualstack by @yunfeng-scale in https://github.com/vllm-project/vllm/pull/2408
- [Minor] Rename phi_1_5 to phi by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2385
- [DOC] Add additional comments for LLMEngine and AsyncLLMEngine by @litone01 in https://github.com/vllm-project/vllm/pull/1011
- [Minor] Fix the format in quick start guide related to Model Scope by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2425
- Add gradio chatbot for openai webserver by @arkohut in https://github.com/vllm-project/vllm/pull/2307
- [BUG] RuntimeError: deque mutated during iteration in abort_seq_group by @chenxu2048 in https://github.com/vllm-project/vllm/pull/2371
- Allow setting fastapi root_path argument by @chiragjn in https://github.com/vllm-project/vllm/pull/2341
- Address Phi modeling update 2 by @huiwy in https://github.com/vllm-project/vllm/pull/2428
- Update a more user-friendly error message, offering more considerate advice for beginners, when using V100 GPU #1901 by @chuanzhubin in https://github.com/vllm-project/vllm/pull/2374
- Update quickstart.rst with small clarifying change (fix typo) by @nautsimon in https://github.com/vllm-project/vllm/pull/2369
- Aligning
top_p
andtop_k
Sampling by @chenxu2048 in https://github.com/vllm-project/vllm/pull/1885 - [Minor] Fix err msg by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2431
- [Minor] Optimize cuda graph memory usage by @esmeetu in https://github.com/vllm-project/vllm/pull/2437
- [CI] Add Buildkite by @simon-mo in https://github.com/vllm-project/vllm/pull/2355
- Announce the second vLLM meetup by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2444
- Allow buildkite to retry build on agent lost by @simon-mo in https://github.com/vllm-project/vllm/pull/2446
- Fix weigit loading for GQA with TP by @zhangch9 in https://github.com/vllm-project/vllm/pull/2379
- CI: make sure benchmark script exit on error by @simon-mo in https://github.com/vllm-project/vllm/pull/2449
- ci: retry on build failure as well by @simon-mo in https://github.com/vllm-project/vllm/pull/2457
- Add StableLM3B model by @ita9naiwa in https://github.com/vllm-project/vllm/pull/2372
- OpenAI refactoring by @FlorianJoncour in https://github.com/vllm-project/vllm/pull/2360
- [Experimental] Prefix Caching Support by @caoshiyi in https://github.com/vllm-project/vllm/pull/1669
- fix stablelm.py tensor-parallel-size bug by @YingchaoX in https://github.com/vllm-project/vllm/pull/2482
- Minor fix in prefill cache example by @JasonZhu1313 in https://github.com/vllm-project/vllm/pull/2494
- fix: fix some args desc by @zspo in https://github.com/vllm-project/vllm/pull/2487
- [Neuron] Add an option to build with neuron by @liangfu in https://github.com/vllm-project/vllm/pull/2065
- Don't download both safetensor and bin files. by @NikolaBorisov in https://github.com/vllm-project/vllm/pull/2480
- [BugFix] Fix abort_seq_group by @beginlner in https://github.com/vllm-project/vllm/pull/2463
- refactor completion api for readability by @simon-mo in https://github.com/vllm-project/vllm/pull/2499
- Support OpenAI API server in
benchmark_serving.py
by @hmellor in https://github.com/vllm-project/vllm/pull/2172 - Simplify broadcast logic for control messages by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2501
- [Bugfix] fix load local safetensors model by @esmeetu in https://github.com/vllm-project/vllm/pull/2512
- Add benchmark serving to CI by @simon-mo in https://github.com/vllm-project/vllm/pull/2505
- Add
group
as an argument in broadcast ops by @GindaChen in https://github.com/vllm-project/vllm/pull/2522 - [Fix] Keep
scheduler.running
as deque by @njhill in https://github.com/vllm-project/vllm/pull/2523 - migrate pydantic from v1 to v2 by @joennlae in https://github.com/vllm-project/vllm/pull/2531
- [Speculative decoding 2/9] Multi-step worker for draft model by @cadedaniel in https://github.com/vllm-project/vllm/pull/2424
- Fix "Port could not be cast to integer value as
" by @pcmoritz in https://github.com/vllm-project/vllm/pull/2545 - Add qwen2 by @JustinLin610 in https://github.com/vllm-project/vllm/pull/2495
- Fix progress bar and allow HTTPS in
benchmark_serving.py
by @hmellor in https://github.com/vllm-project/vllm/pull/2552 - Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py by @JasonZhu1313 in https://github.com/vllm-project/vllm/pull/2553
- [Feature] Simple API token authentication by @taisazero in https://github.com/vllm-project/vllm/pull/1106
- Add multi-LoRA support by @Yard1 in https://github.com/vllm-project/vllm/pull/1804
- lint: format all python file instead of just source code by @simon-mo in https://github.com/vllm-project/vllm/pull/2567
- [Bugfix] fix crash if max_tokens=None by @NikolaBorisov in https://github.com/vllm-project/vllm/pull/2570
- Added
include_stop_str_in_output
andlength_penalty
parameters to OpenAI API by @galatolofederico in https://github.com/vllm-project/vllm/pull/2562 - [Doc] Fix the syntax error in the doc of supported_models. by @keli-wen in https://github.com/vllm-project/vllm/pull/2584
- Support Batch Completion in Server by @simon-mo in https://github.com/vllm-project/vllm/pull/2529
- fix names and license by @JustinLin610 in https://github.com/vllm-project/vllm/pull/2589
- [Fix] Use a correct device when creating OptionalCUDAGuard by @sh1ng in https://github.com/vllm-project/vllm/pull/2583
- [ROCm] add support to ROCm 6.0 and MI300 by @hongxiayang in https://github.com/vllm-project/vllm/pull/2274
- Support for Stable LM 2 by @dakotamahan-stability in https://github.com/vllm-project/vllm/pull/2598
- Don't build punica kernels by default by @pcmoritz in https://github.com/vllm-project/vllm/pull/2605
- AWQ: Up to 2.66x higher throughput by @casper-hansen in https://github.com/vllm-project/vllm/pull/2566
- Use head_dim in config if exists by @xiangxu-google in https://github.com/vllm-project/vllm/pull/2622
- Custom all reduce kernels by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2192
- [Minor] Fix warning on Ray dependencies by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2630
- Speed up Punica compilation by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2632
- Small async_llm_engine refactor by @andoorve in https://github.com/vllm-project/vllm/pull/2618
- Update Ray version requirements by @simon-mo in https://github.com/vllm-project/vllm/pull/2636
- Support FP8-E5M2 KV Cache by @zhaoyang-star in https://github.com/vllm-project/vllm/pull/2279
- Fix error when tp > 1 by @zhaoyang-star in https://github.com/vllm-project/vllm/pull/2644
- No repeated IPC open by @hanzhi713 in https://github.com/vllm-project/vllm/pull/2642
- ROCm: Allow setting compilation target by @rlrs in https://github.com/vllm-project/vllm/pull/2581
- DeepseekMoE support with Fused MoE kernel by @zwd003 in https://github.com/vllm-project/vllm/pull/2453
- Fused MOE for Mixtral by @pcmoritz in https://github.com/vllm-project/vllm/pull/2542
- Fix 'Actor methods cannot be called directly' when using
--engine-use-ray
by @HermitSun in https://github.com/vllm-project/vllm/pull/2664 - Add swap_blocks unit tests by @sh1ng in https://github.com/vllm-project/vllm/pull/2616
- Fix a small typo (tenosr -> tensor) by @pcmoritz in https://github.com/vllm-project/vllm/pull/2672
- [Minor] Fix false warning when TP=1 by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2674
- Add quantized mixtral support by @WoosukKwon in https://github.com/vllm-project/vllm/pull/2673
- Bump up version to v0.3.0 by @zhuohan123 in https://github.com/vllm-project/vllm/pull/2656
New Contributors
- @payoto made their first contribution in https://github.com/vllm-project/vllm/pull/2237
- @NadavShmayo made their first contribution in https://github.com/vllm-project/vllm/pull/2290
- @EvilPsyCHo made their first contribution in https://github.com/vllm-project/vllm/pull/2390
- @litone01 made their first contribution in https://github.com/vllm-project/vllm/pull/1011
- @arkohut made their first contribution in https://github.com/vllm-project/vllm/pull/2307
- @chiragjn made their first contribution in https://github.com/vllm-project/vllm/pull/2341
- @huiwy made their first contribution in https://github.com/vllm-project/vllm/pull/2428
- @chuanzhubin made their first contribution in https://github.com/vllm-project/vllm/pull/2374
- @nautsimon made their first contribution in https://github.com/vllm-project/vllm/pull/2369
- @zhangch9 made their first contribution in https://github.com/vllm-project/vllm/pull/2379
- @ita9naiwa made their first contribution in https://github.com/vllm-project/vllm/pull/2372
- @caoshiyi made their first contribution in https://github.com/vllm-project/vllm/pull/1669
- @YingchaoX made their first contribution in https://github.com/vllm-project/vllm/pull/2482
- @JasonZhu1313 made their first contribution in https://github.com/vllm-project/vllm/pull/2494
- @zspo made their first contribution in https://github.com/vllm-project/vllm/pull/2487
- @liangfu made their first contribution in https://github.com/vllm-project/vllm/pull/2065
- @NikolaBorisov made their first contribution in https://github.com/vllm-project/vllm/pull/2480
- @GindaChen made their first contribution in https://github.com/vllm-project/vllm/pull/2522
- @njhill made their first contribution in https://github.com/vllm-project/vllm/pull/2523
- @joennlae made their first contribution in https://github.com/vllm-project/vllm/pull/2531
- @pcmoritz made their first contribution in https://github.com/vllm-project/vllm/pull/2545
- @JustinLin610 made their first contribution in https://github.com/vllm-project/vllm/pull/2495
- @taisazero made their first contribution in https://github.com/vllm-project/vllm/pull/1106
- @galatolofederico made their first contribution in https://github.com/vllm-project/vllm/pull/2562
- @keli-wen made their first contribution in https://github.com/vllm-project/vllm/pull/2584
- @sh1ng made their first contribution in https://github.com/vllm-project/vllm/pull/2583
- @hongxiayang made their first contribution in https://github.com/vllm-project/vllm/pull/2274
- @dakotamahan-stability made their first contribution in https://github.com/vllm-project/vllm/pull/2598
- @xiangxu-google made their first contribution in https://github.com/vllm-project/vllm/pull/2622
- @andoorve made their first contribution in https://github.com/vllm-project/vllm/pull/2618
- @rlrs made their first contribution in https://github.com/vllm-project/vllm/pull/2581
- @zwd003 made their first contribution in https://github.com/vllm-project/vllm/pull/2453
Full Changelog: https://github.com/vllm-project/vllm/compare/v0.2.7...v0.3.0
1、 vllm-0.3.0+cu118-cp310-cp310-manylinux1_x86_64.whl 36.04MB
2、 vllm-0.3.0+cu118-cp311-cp311-manylinux1_x86_64.whl 36.06MB
3、 vllm-0.3.0+cu118-cp38-cp38-manylinux1_x86_64.whl 36.04MB
4、 vllm-0.3.0+cu118-cp39-cp39-manylinux1_x86_64.whl 36.04MB
5、 vllm-0.3.0-cp310-cp310-manylinux1_x86_64.whl 36.27MB
6、 vllm-0.3.0-cp311-cp311-manylinux1_x86_64.whl 36.29MB
7、 vllm-0.3.0-cp38-cp38-manylinux1_x86_64.whl 36.27MB
8、 vllm-0.3.0-cp39-cp39-manylinux1_x86_64.whl 36.26MB