v0.2.0

sgl-project/sglang

版本发布时间: 2024-07-25 23:58:24

sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)

Highlights

We performed extensive engineering to improve the base performance. Compared to TensorRT-LLM and vLLM, SGLang now consistently delivers superior or competitive performance in both online and offline scenarios, handling models from Llama-8B to Llama-405B, on A100 and H100 GPUs, using FP8 and FP16. See the latest blog.
New models: Llama3 405B, Deepseek MoE, InternLM, GPTBigCode, Mistral-Nemo

What's Changed

Optimize mem indices mangement by @hnyls2002 in https://github.com/sgl-project/sglang/pull/619
Unify index operations by @hnyls2002 in https://github.com/sgl-project/sglang/pull/620
Simplify mem state by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/623
Improve tensor parallel performance by @Ying1123 in https://github.com/sgl-project/sglang/pull/625
Bump version to 0.1.21 by @Ying1123 in https://github.com/sgl-project/sglang/pull/626
Fix model forward grad by @hnyls2002 in https://github.com/sgl-project/sglang/pull/628
Update docker file by @Ying1123 in https://github.com/sgl-project/sglang/pull/629
Disable NCCL_NVLS by default by @Ying1123 in https://github.com/sgl-project/sglang/pull/631
Add qwen2 tie word embedding by @yileld in https://github.com/sgl-project/sglang/pull/630
Add support for VertexAI safety settings by @AidanCooper in https://github.com/sgl-project/sglang/pull/624
Fix vertexai by @hnyls2002 in https://github.com/sgl-project/sglang/pull/633
Reduce docker size by @hnyls2002 in https://github.com/sgl-project/sglang/pull/632
clean up step function by @Ying1123 in https://github.com/sgl-project/sglang/pull/635
feat: support internlm2 by @zhyncs in https://github.com/sgl-project/sglang/pull/636
misc: add pre-commit config by @zhyncs in https://github.com/sgl-project/sglang/pull/637
misc: add issue and pr template by @zhyncs in https://github.com/sgl-project/sglang/pull/638
Flashinfer sample kernel by @hnyls2002 in https://github.com/sgl-project/sglang/pull/617
Move global_server_args_dict by @hnyls2002 in https://github.com/sgl-project/sglang/pull/642
Increase the capacity of the memory pool by @Ying1123 in https://github.com/sgl-project/sglang/pull/643
feat: add check_env by @zhyncs in https://github.com/sgl-project/sglang/pull/645
Remove the dependency of rpyc by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/646
misc: rm rpyc from PACKAGE_LIST by @zhyncs in https://github.com/sgl-project/sglang/pull/649
fix: set ulimit -n 65535 by @zhyncs in https://github.com/sgl-project/sglang/pull/647
feat: add lint workflow by @zhyncs in https://github.com/sgl-project/sglang/pull/648
fix: resolve lint error by @zhyncs in https://github.com/sgl-project/sglang/pull/650
Remove useless variables in infer_batch.py by @Ying1123 in https://github.com/sgl-project/sglang/pull/651
Detokenize incrementally when streaming by @hnyls2002 in https://github.com/sgl-project/sglang/pull/653
TokenizerManager.context_len should inherit from `server_args.conte… by @shrirajh in https://github.com/sgl-project/sglang/pull/654
Remove cached triton launcher by @merrymercy in https://github.com/sgl-project/sglang/pull/656
perf: reduce ttft and itl with stream_interval 1 by @zhyncs in https://github.com/sgl-project/sglang/pull/658
feat: add benchmark serving by @zhyncs in https://github.com/sgl-project/sglang/pull/657
refactor model loader [unreachable code]: initial refactor by @Ying1123 in https://github.com/sgl-project/sglang/pull/655
misc: update SGLang package description by @zhyncs in https://github.com/sgl-project/sglang/pull/659
Update Readme by @Ying1123 in https://github.com/sgl-project/sglang/pull/660
feat: update check env by @zhyncs in https://github.com/sgl-project/sglang/pull/661
Improve docs by @Ying1123 in https://github.com/sgl-project/sglang/pull/662
Add benchmark instructions by @Ying1123 in https://github.com/sgl-project/sglang/pull/663
Fix jump forward when streaming by @hnyls2002 in https://github.com/sgl-project/sglang/pull/665
Fix kill process util by @ispobock in https://github.com/sgl-project/sglang/pull/666
Add support for OpenAI API parallel sampling by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/640
Update OpenAI API by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/667
Temporary fix invalid sample results by @hnyls2002 in https://github.com/sgl-project/sglang/pull/668
Support random dataset in bench_serving.py by @merrymercy in https://github.com/sgl-project/sglang/pull/669
Revert "Temporary fix invalid sample results" by @hnyls2002 in https://github.com/sgl-project/sglang/pull/673
refactor model loader: initial refactor by @Ying1123 in https://github.com/sgl-project/sglang/pull/664
Fix cuda graph with flashinfer by @merrymercy in https://github.com/sgl-project/sglang/pull/675
Tmp fix illegal sample by @hnyls2002 in https://github.com/sgl-project/sglang/pull/676
Update version to 0.1.22 by @Ying1123 in https://github.com/sgl-project/sglang/pull/677
Fallback when sampling failed by @ispobock in https://github.com/sgl-project/sglang/pull/678
feat: support TRT LLM benchmark and multiple benchmarks by @zhyncs in https://github.com/sgl-project/sglang/pull/670
Decouple kv by @hnyls2002 in https://github.com/sgl-project/sglang/pull/679
Support gpt-bigcode model class by @hnyls2002 in https://github.com/sgl-project/sglang/pull/681
support non-streaming benchmark by @merrymercy in https://github.com/sgl-project/sglang/pull/682
Fix StreamExecutor.fork() losing the current role start index. by @max99x in https://github.com/sgl-project/sglang/pull/684
feat: update bench serving by @zhyncs in https://github.com/sgl-project/sglang/pull/685
misc: update output file logic by @zhyncs in https://github.com/sgl-project/sglang/pull/686
Allow disabling streaming in bench by @merrymercy in https://github.com/sgl-project/sglang/pull/687
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/688
Support Deepseek MoE Model by @hnyls2002 in https://github.com/sgl-project/sglang/pull/689
misc: recommend to use chat model for benchmark by @zhyncs in https://github.com/sgl-project/sglang/pull/690
Support Mistral-Nemo by @ispobock in https://github.com/sgl-project/sglang/pull/691
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/692
fix: update bench serving by @zhyncs in https://github.com/sgl-project/sglang/pull/694
misc: update output token logic by @zhyncs in https://github.com/sgl-project/sglang/pull/695
Tune params by @Ying1123 in https://github.com/sgl-project/sglang/pull/696
Fix trt benchmark by @Ying1123 in https://github.com/sgl-project/sglang/pull/697
misc: fix typo by @zhyncs in https://github.com/sgl-project/sglang/pull/698
Fix flashinfer by @Ying1123 in https://github.com/sgl-project/sglang/pull/700
Fix hf config loading by @ispobock in https://github.com/sgl-project/sglang/pull/702
Use min new token ratio at start by @hnyls2002 in https://github.com/sgl-project/sglang/pull/701
feat: add e2e latency by @zhyncs in https://github.com/sgl-project/sglang/pull/704
Update vllm version to support llama3.1 by @Ying1123 in https://github.com/sgl-project/sglang/pull/705
bump version to 0.1.23 by @Ying1123 in https://github.com/sgl-project/sglang/pull/706
Reduce hardcoded logic of kernel usage by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/707
Fix multi-node deadlock by @merrymercy in https://github.com/sgl-project/sglang/pull/709
Auto adjust new ratio by @hnyls2002 in https://github.com/sgl-project/sglang/pull/708
Fix prefill size by @Ying1123 in https://github.com/sgl-project/sglang/pull/711
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/712
docs: update doc by @zhyncs in https://github.com/sgl-project/sglang/pull/713
fix: llama 3.1 405b fp8 by @zhyncs in https://github.com/sgl-project/sglang/pull/714
misc: update doc by @zhyncs in https://github.com/sgl-project/sglang/pull/715
Improve benchmark scripts by @Ying1123 in https://github.com/sgl-project/sglang/pull/717
Bump version to 0.1.24 by @Ying1123 in https://github.com/sgl-project/sglang/pull/718
docs: update supported models by @zhyncs in https://github.com/sgl-project/sglang/pull/719
docs: update comment by @zhyncs in https://github.com/sgl-project/sglang/pull/721
chore: add close inactive issues workflow by @zhyncs in https://github.com/sgl-project/sglang/pull/722
misc: update bulid instruction by @zhyncs in https://github.com/sgl-project/sglang/pull/724
fix: fp8 config by @Ying1123 in https://github.com/sgl-project/sglang/pull/723
Fix dockerfile and triton cache manager by @hnyls2002 in https://github.com/sgl-project/sglang/pull/720
chore: bump v0.1.25 by @zhyncs in https://github.com/sgl-project/sglang/pull/725
fix: resolve the logo display issue on the PyPI page by @zhyncs in https://github.com/sgl-project/sglang/pull/726
misc: update bug issue template by @zhyncs in https://github.com/sgl-project/sglang/pull/727
Revert "fix: fp8 config" by @Ying1123 in https://github.com/sgl-project/sglang/pull/728
Fix bugs (fp8 checkpoints, triton cache manager) by @Ying1123 in https://github.com/sgl-project/sglang/pull/729
Bump version to 0.2.0 by @Ying1123 in https://github.com/sgl-project/sglang/pull/730

New Contributors

@yileld made their first contribution in https://github.com/sgl-project/sglang/pull/630
@AidanCooper made their first contribution in https://github.com/sgl-project/sglang/pull/624
@zhyncs made their first contribution in https://github.com/sgl-project/sglang/pull/636
@shrirajh made their first contribution in https://github.com/sgl-project/sglang/pull/654
@yichuan520030910320 made their first contribution in https://github.com/sgl-project/sglang/pull/640
@max99x made their first contribution in https://github.com/sgl-project/sglang/pull/684

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.1.20...v0.2.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-07-25发行的版本