v0.1.20

sgl-project/sglang

版本发布时间: 2024-07-14 08:33:05

sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)

Highlights

Enable CUDA graph by default. It brings 1.5x - 2x speedup for small batch size decoding (#612)
Model support: Gemma2, minicpm, Qwen2 MoE
Docker support (#217 )
Various latency optimizations

What's Changed

Add docker file by @Ying1123 in https://github.com/sgl-project/sglang/pull/588
Add Gemma2 by @Ying1123 in https://github.com/sgl-project/sglang/pull/592
Format by @Ying1123 in https://github.com/sgl-project/sglang/pull/593
Fix Llava model by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/594
- fix(detokenizer_manager.py): fix truncated decoded output by @Titan-p in https://github.com/sgl-project/sglang/pull/586
Add --enable-p2p-check option by @hnyls2002 in https://github.com/sgl-project/sglang/pull/599
Fix streaming by @hnyls2002 in https://github.com/sgl-project/sglang/pull/600
Reduce number of workspaces for flashinfer by @wisclmy0611 in https://github.com/sgl-project/sglang/pull/601
add LogitsMetadata by @hnyls2002 in https://github.com/sgl-project/sglang/pull/604
add minicpm support by @Titan-p in https://github.com/sgl-project/sglang/pull/602
Make sglang compat with vllm 0.5.1 by @M0gician in https://github.com/sgl-project/sglang/pull/598
Add Qwen2 MoE support by @M0gician in https://github.com/sgl-project/sglang/pull/603
Update chat template for qwen and yi-1.5. by @for-just-we in https://github.com/sgl-project/sglang/pull/530
[Feat] Expose logprob options to sgl.gen API by @huyiwen in https://github.com/sgl-project/sglang/pull/503
Fix bench latency by @merrymercy in https://github.com/sgl-project/sglang/pull/607
Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py by @merrymercy in https://github.com/sgl-project/sglang/pull/609
Clean up the usage of flashinfer by @merrymercy in https://github.com/sgl-project/sglang/pull/610
Cleanup attention backend: flashinfer and triton by @merrymercy in https://github.com/sgl-project/sglang/pull/611
Enable cuda graph by default by @merrymercy in https://github.com/sgl-project/sglang/pull/612
Improve benchmark scripts & fix llava by @merrymercy in https://github.com/sgl-project/sglang/pull/613
Memorypool chunked prefetch by @hnyls2002 in https://github.com/sgl-project/sglang/pull/614
Improve benchmark scripts by @merrymercy in https://github.com/sgl-project/sglang/pull/615
Fix memory pool index error by @Ying1123 in https://github.com/sgl-project/sglang/pull/616
Bump version to 0.1.20 by @merrymercy in https://github.com/sgl-project/sglang/pull/618

New Contributors

@wisclmy0611 made their first contribution in https://github.com/sgl-project/sglang/pull/594
@Titan-p made their first contribution in https://github.com/sgl-project/sglang/pull/586
@M0gician made their first contribution in https://github.com/sgl-project/sglang/pull/598
@for-just-we made their first contribution in https://github.com/sgl-project/sglang/pull/530

Full Changelog: https://github.com/sgl-project/sglang/compare/v0.1.18...v0.1.20

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-07-14发行的版本