sgl-project/sglang
Fork: 549 Star: 6285 (更新于 2024-12-01 03:09:14)
license: Apache-2.0
Language: Python .
SGLang is a fast serving framework for large language models and vision language models.
最后发布版本: v0.3.0 ( 2024-09-04 19:50:29)
| Blog | Documentation | Join Slack | Join Bi-Weekly Development Meeting | Slides |
News
- [2024/10] 🔥 The First SGLang Online Meetup (slides).
- [2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
- [2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).
More
- [2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
- [2024/04] SGLang is used by the official LLaVA-NeXT (video) release (blog).
- [2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
- [2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).
About
SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:
- Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).
- Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
- Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
- Active Community: SGLang is open-source and backed by an active community with industry adoption.
Getting Started
- Install SGLang
- Send requests
- Backend: SGLang Runtime (SRT)
- Frontend: Structured Generation Language (SGLang)
Benchmark And Performance
Learn more in our release blogs: v0.2 blog, v0.3 blog
Roadmap
Adoption and Sponsorship
The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.
Acknowledgment and Citation
We learned from the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL. Please cite our paper, SGLang: Efficient Execution of Structured Language Model Programs, if you find the project useful.
最近版本更新:(数据更新于 2024-09-16 19:41:36)
2024-09-04 19:50:29 v0.3.0
2024-08-16 13:16:08 v0.2.13
2024-08-02 16:55:00 v0.2.9
2024-07-27 03:56:44 v0.2.5
2024-07-25 23:58:24 v0.2.0
2024-07-14 08:33:05 v0.1.20
2024-07-04 14:35:42 v0.1.18
2024-06-08 10:58:55 v0.1.17
2024-05-14 08:36:05 v0.1.16
2024-03-11 20:52:58 v0.1.13
主题(topics):
cuda, inference, llama, llama2, llama3, llama3-1, llava, llm, llm-serving, moe, pytorch, transformer, vlm
sgl-project/sglang同语言 Python最近更新仓库
2024-12-22 09:03:32 ultralytics/ultralytics
2024-12-21 13:26:40 notepad-plus-plus/nppPluginList
2024-12-21 11:42:53 XiaoMi/ha_xiaomi_home
2024-12-21 04:33:22 comfyanonymous/ComfyUI
2024-12-20 18:47:56 home-assistant/core
2024-12-20 15:41:40 jxxghp/MoviePilot