v0.2.9

sgl-project/sglang

版本发布时间: 2024-08-02 16:55:00

sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)

Highlights

New feature: Chunked prefill (#800, #811)
New models: Deepseek v2
Performance improvement: vectorized logprob computation
Accuracy fix: fix the double BOS problem in the chat template; move logits to float32; update flashinfer sampling kernels
Feature fix: fixed many missing logprob-related features in the OpenAI API server
CI/CD infra is now fully ready. The tests cover frontend, backend, accuracy, and performance tests.

What's Changed

Deepseek v2 support by @hnyls2002 in https://github.com/sgl-project/sglang/pull/693
Fix context length by @hnyls2002 in https://github.com/sgl-project/sglang/pull/757
docs: update model support by @zhyncs in https://github.com/sgl-project/sglang/pull/760
fix: not run workflows on fork repo by @zhyncs in https://github.com/sgl-project/sglang/pull/762
Update supported models by @hnyls2002 in https://github.com/sgl-project/sglang/pull/763
Fix TransformerTokenizer init for chatglm2 & 3 by @ispobock in https://github.com/sgl-project/sglang/pull/761
[Minor] Improve the code style in TokenizerManager by @merrymercy in https://github.com/sgl-project/sglang/pull/767
Update readme by @Ying1123 in https://github.com/sgl-project/sglang/pull/769
feat: add fake tag by @zhyncs in https://github.com/sgl-project/sglang/pull/770
Fix max_tokens for OpenAI chat completion API by @merrymercy in https://github.com/sgl-project/sglang/pull/766
Fix max new tokens by @merrymercy in https://github.com/sgl-project/sglang/pull/772
Move sampling logits to float32 by @merrymercy in https://github.com/sgl-project/sglang/pull/773
minor refactor: move check server args to server_args.py by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/774
Fix return_log_probs with cuda graph by @merrymercy in https://github.com/sgl-project/sglang/pull/775
Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs by @merrymercy in https://github.com/sgl-project/sglang/pull/776
Allow disabling flashinfer sampling kernel by @merrymercy in https://github.com/sgl-project/sglang/pull/778
Bump version to 0.2.6 by @merrymercy in https://github.com/sgl-project/sglang/pull/779
fix: replace pillow with PIL in PACKAGE_LIST by @zhyncs in https://github.com/sgl-project/sglang/pull/781
docs: init readthedocs support by @zhyncs in https://github.com/sgl-project/sglang/pull/783
fix: init readthedocs support by @zhyncs in https://github.com/sgl-project/sglang/pull/784
fix: exclude logo png in gitignore by @zhyncs in https://github.com/sgl-project/sglang/pull/785
docs: update index by @zhyncs in https://github.com/sgl-project/sglang/pull/786
Vectorize logprobs computation by @Ying1123 in https://github.com/sgl-project/sglang/pull/787
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/788
docs: make badges center by @zhyncs in https://github.com/sgl-project/sglang/pull/789
chore: add copyright for srt by @zhyncs in https://github.com/sgl-project/sglang/pull/790
Fix echo + lobprob for OpenAI API when the prompt is a list by @Ying1123 in https://github.com/sgl-project/sglang/pull/791
Update README.md by @Ying1123 in https://github.com/sgl-project/sglang/pull/792
Lazy-import third-party backends by @bgyoon in https://github.com/sgl-project/sglang/pull/794
Fix lazy import location by @Ying1123 in https://github.com/sgl-project/sglang/pull/795
Fix logging by @Ying1123 in https://github.com/sgl-project/sglang/pull/796
Add role documentation, add system begin & end tokens by @objnf-dev in https://github.com/sgl-project/sglang/pull/793
Chunked prefill support by @hnyls2002 in https://github.com/sgl-project/sglang/pull/797
Revert "Chunked prefill support" by @Ying1123 in https://github.com/sgl-project/sglang/pull/799
Chunked prefill by @hnyls2002 in https://github.com/sgl-project/sglang/pull/800
fix: update flashinfer to 0.1.2 to fix sampling for cu118 by @zhyncs in https://github.com/sgl-project/sglang/pull/803
Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" by @Ying1123 in https://github.com/sgl-project/sglang/pull/805
feat: add chat template for internlm2-chat by @zhyncs in https://github.com/sgl-project/sglang/pull/802
Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" by @Ying1123 in https://github.com/sgl-project/sglang/pull/806
Add support for OpenAI API : offline batch(file) processing by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/699
Organize public APIs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/809
Remove inf value for chunked prefill size by @hnyls2002 in https://github.com/sgl-project/sglang/pull/812
Revert "Organize public APIs" by @Ying1123 in https://github.com/sgl-project/sglang/pull/815
fix: use v0.2.5 for benchmark by @zhyncs in https://github.com/sgl-project/sglang/pull/814
Fix LiteLLM kwargs by @qeternity in https://github.com/sgl-project/sglang/pull/817
Code structure refactor by @hnyls2002 in https://github.com/sgl-project/sglang/pull/807
docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/819
Fix streaming bug by @objnf-dev in https://github.com/sgl-project/sglang/pull/820
feat: add runner by @zhyncs in https://github.com/sgl-project/sglang/pull/821
feat: add pr e2e test by @zhyncs in https://github.com/sgl-project/sglang/pull/822
Support disable_ignore_eos in bench_serving.py by @Ying1123 in https://github.com/sgl-project/sglang/pull/824
Adjust default mem fraction to avoid OOM by @Ying1123 in https://github.com/sgl-project/sglang/pull/823
Add awq_marlin by @Ying1123 in https://github.com/sgl-project/sglang/pull/826
misc: update e2e test benchmark config by @zhyncs in https://github.com/sgl-project/sglang/pull/825
misc: enable e2e test when push by @zhyncs in https://github.com/sgl-project/sglang/pull/828
docs: add set up runner by @zhyncs in https://github.com/sgl-project/sglang/pull/829
chore: bump v0.2.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/830
Add --max-total-tokens by @hnyls2002 in https://github.com/sgl-project/sglang/pull/840
Fix List input bug by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/838
Add req slots leaking check by @hnyls2002 in https://github.com/sgl-project/sglang/pull/842
docs: update README.md by @eltociear in https://github.com/sgl-project/sglang/pull/843
misc: update e2e test paths config by @zhyncs in https://github.com/sgl-project/sglang/pull/848
chore: update flashinfer to v0.1.3 by @zhyncs in https://github.com/sgl-project/sglang/pull/850
Fix llama for classification by @Ying1123 in https://github.com/sgl-project/sglang/pull/855
Add troubleshooting doc by @Ying1123 in https://github.com/sgl-project/sglang/pull/856
Fix #857 by @kaifronsdal in https://github.com/sgl-project/sglang/pull/858
Add support for logprobs in OpenAI chat API by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/852
Support chunked prefill when radix cache is disabled by @hnyls2002 in https://github.com/sgl-project/sglang/pull/811
misc: update e2e test paths config by @zhyncs in https://github.com/sgl-project/sglang/pull/860
Rename github workflows by @Ying1123 in https://github.com/sgl-project/sglang/pull/861
misc: disable auto release by @zhyncs in https://github.com/sgl-project/sglang/pull/862
misc: add cancel previous at e2e by @zhyncs in https://github.com/sgl-project/sglang/pull/864
Add OpenAI backend to the CI test by @Ying1123 in https://github.com/sgl-project/sglang/pull/869
Fix openai CI tests by @Ying1123 in https://github.com/sgl-project/sglang/pull/870
misc: use pip cache purge and add unit test ci by @zhyncs in https://github.com/sgl-project/sglang/pull/871
misc: update unit test config by @zhyncs in https://github.com/sgl-project/sglang/pull/873
Fix unit tests for the frontend language part by @Ying1123 in https://github.com/sgl-project/sglang/pull/872
bump to 0.2.8 by @Ying1123 in https://github.com/sgl-project/sglang/pull/877
Make scripts under /test/srt as unit tests by @Ying1123 in https://github.com/sgl-project/sglang/pull/875
Update runner docs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/876
Improve the coverage of the openai api server test by @Ying1123 in https://github.com/sgl-project/sglang/pull/878
Implement served_model_name to customize model id when use local mode… by @dionren in https://github.com/sgl-project/sglang/pull/749
Update runner docs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/879
Add more unit tests to CI by @Ying1123 in https://github.com/sgl-project/sglang/pull/880
Add accuracy test to CI: MMLU by @Ying1123 in https://github.com/sgl-project/sglang/pull/882
Update workflow name by @Ying1123 in https://github.com/sgl-project/sglang/pull/883
Fix the double BOS problem in the HF chat template by @Ying1123 in https://github.com/sgl-project/sglang/pull/888
Add benchmark: HumanEval by @Ying1123 in https://github.com/sgl-project/sglang/pull/889
Increase openai client limit by @Ying1123 in https://github.com/sgl-project/sglang/pull/886
Bump version to v0.2.9 by @Ying1123 in https://github.com/sgl-project/sglang/pull/890

New Contributors

@bgyoon made their first contribution in https://github.com/sgl-project/sglang/pull/794
@objnf-dev made their first contribution in https://github.com/sgl-project/sglang/pull/793
@kaifronsdal made their first contribution in https://github.com/sgl-project/sglang/pull/858
@dionren made their first contribution in https://github.com/sgl-project/sglang/pull/749

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.2.5...v0.2.9

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-08-02发行的版本