v0.2.9
版本发布时间: 2024-08-02 16:55:00
sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)
Highlights
- New feature: Chunked prefill (#800, #811)
- New models: Deepseek v2
- Performance improvement: vectorized logprob computation
- Accuracy fix: fix the double BOS problem in the chat template; move logits to float32; update flashinfer sampling kernels
- Feature fix: fixed many missing logprob-related features in the OpenAI API server
- CI/CD infra is now fully ready. The tests cover frontend, backend, accuracy, and performance tests.
What's Changed
- Deepseek v2 support by @hnyls2002 in https://github.com/sgl-project/sglang/pull/693
- Fix context length by @hnyls2002 in https://github.com/sgl-project/sglang/pull/757
- docs: update model support by @zhyncs in https://github.com/sgl-project/sglang/pull/760
- fix: not run workflows on fork repo by @zhyncs in https://github.com/sgl-project/sglang/pull/762
- Update supported models by @hnyls2002 in https://github.com/sgl-project/sglang/pull/763
- Fix TransformerTokenizer init for chatglm2 & 3 by @ispobock in https://github.com/sgl-project/sglang/pull/761
- [Minor] Improve the code style in TokenizerManager by @merrymercy in https://github.com/sgl-project/sglang/pull/767
- Update readme by @Ying1123 in https://github.com/sgl-project/sglang/pull/769
- feat: add fake tag by @zhyncs in https://github.com/sgl-project/sglang/pull/770
- Fix max_tokens for OpenAI chat completion API by @merrymercy in https://github.com/sgl-project/sglang/pull/766
- Fix max new tokens by @merrymercy in https://github.com/sgl-project/sglang/pull/772
- Move sampling logits to float32 by @merrymercy in https://github.com/sgl-project/sglang/pull/773
- minor refactor: move check server args to server_args.py by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/774
- Fix return_log_probs with cuda graph by @merrymercy in https://github.com/sgl-project/sglang/pull/775
- Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs by @merrymercy in https://github.com/sgl-project/sglang/pull/776
- Allow disabling flashinfer sampling kernel by @merrymercy in https://github.com/sgl-project/sglang/pull/778
- Bump version to 0.2.6 by @merrymercy in https://github.com/sgl-project/sglang/pull/779
- fix: replace pillow with PIL in PACKAGE_LIST by @zhyncs in https://github.com/sgl-project/sglang/pull/781
- docs: init readthedocs support by @zhyncs in https://github.com/sgl-project/sglang/pull/783
- fix: init readthedocs support by @zhyncs in https://github.com/sgl-project/sglang/pull/784
- fix: exclude logo png in gitignore by @zhyncs in https://github.com/sgl-project/sglang/pull/785
- docs: update index by @zhyncs in https://github.com/sgl-project/sglang/pull/786
- Vectorize logprobs computation by @Ying1123 in https://github.com/sgl-project/sglang/pull/787
- docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/788
- docs: make badges center by @zhyncs in https://github.com/sgl-project/sglang/pull/789
- chore: add copyright for srt by @zhyncs in https://github.com/sgl-project/sglang/pull/790
- Fix echo + lobprob for OpenAI API when the prompt is a list by @Ying1123 in https://github.com/sgl-project/sglang/pull/791
- Update README.md by @Ying1123 in https://github.com/sgl-project/sglang/pull/792
- Lazy-import third-party backends by @bgyoon in https://github.com/sgl-project/sglang/pull/794
- Fix lazy import location by @Ying1123 in https://github.com/sgl-project/sglang/pull/795
- Fix logging by @Ying1123 in https://github.com/sgl-project/sglang/pull/796
- Add role documentation, add system begin & end tokens by @objnf-dev in https://github.com/sgl-project/sglang/pull/793
- Chunked prefill support by @hnyls2002 in https://github.com/sgl-project/sglang/pull/797
- Revert "Chunked prefill support" by @Ying1123 in https://github.com/sgl-project/sglang/pull/799
- Chunked prefill by @hnyls2002 in https://github.com/sgl-project/sglang/pull/800
- fix: update flashinfer to 0.1.2 to fix sampling for cu118 by @zhyncs in https://github.com/sgl-project/sglang/pull/803
- Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" by @Ying1123 in https://github.com/sgl-project/sglang/pull/805
- feat: add chat template for internlm2-chat by @zhyncs in https://github.com/sgl-project/sglang/pull/802
- Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" by @Ying1123 in https://github.com/sgl-project/sglang/pull/806
- Add support for OpenAI API : offline batch(file) processing by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/699
- Organize public APIs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/809
- Remove inf value for chunked prefill size by @hnyls2002 in https://github.com/sgl-project/sglang/pull/812
- Revert "Organize public APIs" by @Ying1123 in https://github.com/sgl-project/sglang/pull/815
- fix: use v0.2.5 for benchmark by @zhyncs in https://github.com/sgl-project/sglang/pull/814
- Fix LiteLLM kwargs by @qeternity in https://github.com/sgl-project/sglang/pull/817
- Code structure refactor by @hnyls2002 in https://github.com/sgl-project/sglang/pull/807
- docs: update README by @zhyncs in https://github.com/sgl-project/sglang/pull/819
- Fix streaming bug by @objnf-dev in https://github.com/sgl-project/sglang/pull/820
- feat: add runner by @zhyncs in https://github.com/sgl-project/sglang/pull/821
- feat: add pr e2e test by @zhyncs in https://github.com/sgl-project/sglang/pull/822
- Support disable_ignore_eos in bench_serving.py by @Ying1123 in https://github.com/sgl-project/sglang/pull/824
- Adjust default mem fraction to avoid OOM by @Ying1123 in https://github.com/sgl-project/sglang/pull/823
- Add awq_marlin by @Ying1123 in https://github.com/sgl-project/sglang/pull/826
- misc: update e2e test benchmark config by @zhyncs in https://github.com/sgl-project/sglang/pull/825
- misc: enable e2e test when push by @zhyncs in https://github.com/sgl-project/sglang/pull/828
- docs: add set up runner by @zhyncs in https://github.com/sgl-project/sglang/pull/829
- chore: bump v0.2.7 by @zhyncs in https://github.com/sgl-project/sglang/pull/830
- Add
--max-total-tokens
by @hnyls2002 in https://github.com/sgl-project/sglang/pull/840 - Fix List input bug by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/838
- Add req slots leaking check by @hnyls2002 in https://github.com/sgl-project/sglang/pull/842
- docs: update README.md by @eltociear in https://github.com/sgl-project/sglang/pull/843
- misc: update e2e test paths config by @zhyncs in https://github.com/sgl-project/sglang/pull/848
- chore: update flashinfer to v0.1.3 by @zhyncs in https://github.com/sgl-project/sglang/pull/850
- Fix llama for classification by @Ying1123 in https://github.com/sgl-project/sglang/pull/855
- Add troubleshooting doc by @Ying1123 in https://github.com/sgl-project/sglang/pull/856
- Fix #857 by @kaifronsdal in https://github.com/sgl-project/sglang/pull/858
- Add support for logprobs in OpenAI chat API by @yichuan520030910320 in https://github.com/sgl-project/sglang/pull/852
- Support chunked prefill when radix cache is disabled by @hnyls2002 in https://github.com/sgl-project/sglang/pull/811
- misc: update e2e test paths config by @zhyncs in https://github.com/sgl-project/sglang/pull/860
- Rename github workflows by @Ying1123 in https://github.com/sgl-project/sglang/pull/861
- misc: disable auto release by @zhyncs in https://github.com/sgl-project/sglang/pull/862
- misc: add cancel previous at e2e by @zhyncs in https://github.com/sgl-project/sglang/pull/864
- Add OpenAI backend to the CI test by @Ying1123 in https://github.com/sgl-project/sglang/pull/869
- Fix openai CI tests by @Ying1123 in https://github.com/sgl-project/sglang/pull/870
- misc: use pip cache purge and add unit test ci by @zhyncs in https://github.com/sgl-project/sglang/pull/871
- misc: update unit test config by @zhyncs in https://github.com/sgl-project/sglang/pull/873
- Fix unit tests for the frontend language part by @Ying1123 in https://github.com/sgl-project/sglang/pull/872
- bump to 0.2.8 by @Ying1123 in https://github.com/sgl-project/sglang/pull/877
- Make scripts under
/test/srt
as unit tests by @Ying1123 in https://github.com/sgl-project/sglang/pull/875 - Update runner docs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/876
- Improve the coverage of the openai api server test by @Ying1123 in https://github.com/sgl-project/sglang/pull/878
- Implement served_model_name to customize model id when use local mode… by @dionren in https://github.com/sgl-project/sglang/pull/749
- Update runner docs by @hnyls2002 in https://github.com/sgl-project/sglang/pull/879
- Add more unit tests to CI by @Ying1123 in https://github.com/sgl-project/sglang/pull/880
- Add accuracy test to CI: MMLU by @Ying1123 in https://github.com/sgl-project/sglang/pull/882
- Update workflow name by @Ying1123 in https://github.com/sgl-project/sglang/pull/883
- Fix the double BOS problem in the HF chat template by @Ying1123 in https://github.com/sgl-project/sglang/pull/888
- Add benchmark: HumanEval by @Ying1123 in https://github.com/sgl-project/sglang/pull/889
- Increase openai client limit by @Ying1123 in https://github.com/sgl-project/sglang/pull/886
- Bump version to v0.2.9 by @Ying1123 in https://github.com/sgl-project/sglang/pull/890
New Contributors
- @bgyoon made their first contribution in https://github.com/sgl-project/sglang/pull/794
- @objnf-dev made their first contribution in https://github.com/sgl-project/sglang/pull/793
- @kaifronsdal made their first contribution in https://github.com/sgl-project/sglang/pull/858
- @dionren made their first contribution in https://github.com/sgl-project/sglang/pull/749
Full Changelog: https://github.com/sgl-project/sglang/compare/v0.2.5...v0.2.9