v0.1.16
版本发布时间: 2024-05-14 08:36:05
sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)
Highlight
- Support more models: DBRX, Command-R, Gemma
- Support llava-video (#423, https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)
- Cache performance improvements (#418, #364)
- Marlin quantization kernels
- Many bug fixes
- Update dependencies to be compatible with their latest versions
What's Changed
- Fix Runtime missing some ServerArgs options by @Qubitium in https://github.com/sgl-project/sglang/pull/281
- adding the triton docker build minimal example by @amirarsalan90 in https://github.com/sgl-project/sglang/pull/242
- Fix flashinfer >= 0.0.3 compat by @Qubitium in https://github.com/sgl-project/sglang/pull/282
- Fix Incorrect CURL Request Example in README by @amirarsalan90 in https://github.com/sgl-project/sglang/pull/287
- enable marlin kernels by @qeternity in https://github.com/sgl-project/sglang/pull/286
- Fix env (docker) compat due to file usage by @Qubitium in https://github.com/sgl-project/sglang/pull/288
- Fix marlin model loading compat with autogptq by @Liurl21 in https://github.com/sgl-project/sglang/pull/290
- Fix outlines-0.0.35 incompatibility by @ZhouGongZaiShi in https://github.com/sgl-project/sglang/pull/291
- [Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models by @Luodian in https://github.com/sgl-project/sglang/pull/311
- Use Anthropic messages API by @janimo in https://github.com/sgl-project/sglang/pull/304
- Add StableLM model. by @janimo in https://github.com/sgl-project/sglang/pull/301
- Support oai in benchmark/mmlu by @merrymercy in https://github.com/sgl-project/sglang/pull/323
- Update version to v0.1.14 by @merrymercy in https://github.com/sgl-project/sglang/pull/324
- Cleanup codebase: removed unnecessary code/logic by @Qubitium in https://github.com/sgl-project/sglang/pull/298
- Update dependencies by @janimo in https://github.com/sgl-project/sglang/pull/326
- Openrouter usage example by @janimo in https://github.com/sgl-project/sglang/pull/327
-
model_rpc
style improvement by @hnyls2002 in https://github.com/sgl-project/sglang/pull/293 -
model_runner
simplify by @hnyls2002 in https://github.com/sgl-project/sglang/pull/329 - Logprobs Refractor by @hnyls2002 in https://github.com/sgl-project/sglang/pull/331
-
DBRX
support by @hnyls2002 in https://github.com/sgl-project/sglang/pull/337 - Add support for new autogptq quant_config.checkpoint_format by @Qubitium in https://github.com/sgl-project/sglang/pull/332
- Fix llava parallelism/fork bug by @lockon-n in https://github.com/sgl-project/sglang/pull/315
- Eliminate 2 gpu ops during sampling when logit_bias is zero by @hnyls2002 in https://github.com/sgl-project/sglang/pull/343
- Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" by @hnyls2002 in https://github.com/sgl-project/sglang/pull/345
- Eliminate 2 gpu ops during sampling when logit_bias is zero by @Qubitium in https://github.com/sgl-project/sglang/pull/338
- Add timeout to get_meta_info by @SimoneRaponi in https://github.com/sgl-project/sglang/pull/346
- Fix typos in infer_batch.py by @tom-doerr in https://github.com/sgl-project/sglang/pull/354
- Time cost utils by @hnyls2002 in https://github.com/sgl-project/sglang/pull/355
- Update README.md by @eltociear in https://github.com/sgl-project/sglang/pull/358
- support
command-r
by @ZhouXingg in https://github.com/sgl-project/sglang/pull/369 - Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) by @fronx in https://github.com/sgl-project/sglang/pull/368
- Update model support in readme by @Ying1123 in https://github.com/sgl-project/sglang/pull/370
- Optimize radix tree matching by @ispobock in https://github.com/sgl-project/sglang/pull/364
- Reduce overhead when
fork(1)
by @hnyls2002 in https://github.com/sgl-project/sglang/pull/375 - llama3 instruct template by @qeternity in https://github.com/sgl-project/sglang/pull/372
- add
.isort.cfg
by @hnyls2002 in https://github.com/sgl-project/sglang/pull/378 - Revert removing the unused imports by @hnyls2002 in https://github.com/sgl-project/sglang/pull/385
- Benchmark Updates by @hnyls2002 in https://github.com/sgl-project/sglang/pull/382
- Improve performance when running with full parallel by @hnyls2002 in https://github.com/sgl-project/sglang/pull/394
- Minor: style improvement of radix_cache and memory_pool by @hnyls2002 in https://github.com/sgl-project/sglang/pull/395
- Format Benchmark Code by @hnyls2002 in https://github.com/sgl-project/sglang/pull/399
- Fix chatml template by @merrymercy in https://github.com/sgl-project/sglang/pull/406
- Adding RAG tracing & eval cookbook using Parea by @joschkabraun in https://github.com/sgl-project/sglang/pull/390
- SamplingParams add "spaces_between_special_tokens" argument by @ZhouXingg in https://github.com/sgl-project/sglang/pull/392
- Organize Benchmark by @hnyls2002 in https://github.com/sgl-project/sglang/pull/381
- Add Cohere Command R chat template by @noah-kim-theori in https://github.com/sgl-project/sglang/pull/411
- Fix
sync()
whenfork(1)
by @hnyls2002 in https://github.com/sgl-project/sglang/pull/412 - Include finish reason in meta info response by @qeternity in https://github.com/sgl-project/sglang/pull/415
- Make public APIs more standard. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/416
- Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 by @Qubitium in https://github.com/sgl-project/sglang/pull/380
- Optimize the memory usage of logits processor by @merrymercy in https://github.com/sgl-project/sglang/pull/420
- Clean up by @merrymercy in https://github.com/sgl-project/sglang/pull/422
- Fix logit processor bugs by @merrymercy in https://github.com/sgl-project/sglang/pull/427
- Minor fix for the import path by @merrymercy in https://github.com/sgl-project/sglang/pull/428
- Move openai api server into a separate file by @merrymercy in https://github.com/sgl-project/sglang/pull/429
- Fix flashinfer by @merrymercy in https://github.com/sgl-project/sglang/pull/430
- Update version to 0.1.15 by @merrymercy in https://github.com/sgl-project/sglang/pull/431
- Misc fixes by @merrymercy in https://github.com/sgl-project/sglang/pull/432
- Allow
input_ids
in the input of the/generate
endpoint by @lolipopshock in https://github.com/sgl-project/sglang/pull/363 - Improve error handling by @merrymercy in https://github.com/sgl-project/sglang/pull/433
- Cache optimizations by @hnyls2002 in https://github.com/sgl-project/sglang/pull/418
- Update readme by @merrymercy in https://github.com/sgl-project/sglang/pull/434
- Raise errors for prompts that are too long by @merrymercy in https://github.com/sgl-project/sglang/pull/436
- support llava video by @ZhangYuanhan-AI in https://github.com/sgl-project/sglang/pull/426
- Fix streaming by @merrymercy in https://github.com/sgl-project/sglang/pull/437
- Update version to 0.1.16 by @merrymercy in https://github.com/sgl-project/sglang/pull/438
New Contributors
- @Qubitium made their first contribution in https://github.com/sgl-project/sglang/pull/281
- @amirarsalan90 made their first contribution in https://github.com/sgl-project/sglang/pull/242
- @Liurl21 made their first contribution in https://github.com/sgl-project/sglang/pull/290
- @ZhouGongZaiShi made their first contribution in https://github.com/sgl-project/sglang/pull/291
- @Luodian made their first contribution in https://github.com/sgl-project/sglang/pull/311
- @janimo made their first contribution in https://github.com/sgl-project/sglang/pull/304
- @lockon-n made their first contribution in https://github.com/sgl-project/sglang/pull/315
- @SimoneRaponi made their first contribution in https://github.com/sgl-project/sglang/pull/346
- @tom-doerr made their first contribution in https://github.com/sgl-project/sglang/pull/354
- @ZhouXingg made their first contribution in https://github.com/sgl-project/sglang/pull/369
- @fronx made their first contribution in https://github.com/sgl-project/sglang/pull/368
- @ispobock made their first contribution in https://github.com/sgl-project/sglang/pull/364
- @joschkabraun made their first contribution in https://github.com/sgl-project/sglang/pull/390
- @noah-kim-theori made their first contribution in https://github.com/sgl-project/sglang/pull/411
- @lolipopshock made their first contribution in https://github.com/sgl-project/sglang/pull/363
- @ZhangYuanhan-AI made their first contribution in https://github.com/sgl-project/sglang/pull/426
Full Changelog: https://github.com/sgl-project/sglang/compare/v0.1.13...v0.1.16