v2.1.1
版本发布时间: 2024-07-04 18:43:49
huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)
Main changes
- Bugfixes
- Added FlashDecoding support (Beta) use FLASH_DECODING=1 to use TGI with flash decoding (large speedups on long queries). https://github.com/huggingface/text-generation-inference/pull/1940
- Use Marlin over GPTQ kernels for faster GPTQ inference https://github.com/huggingface/text-generation-inference/pull/2111
What's Changed
- Fixing the CI to also run in release when it's a tag ? by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2138
- fix microsoft/Phi-3-mini-4k-instruct crash in batch.slots[batch.slot_… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2148
- Fixing clippy. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2149
- fix: use weights from base_layer by @drbh in https://github.com/huggingface/text-generation-inference/pull/2141
- feat: download lora adapter weights from launcher by @drbh in https://github.com/huggingface/text-generation-inference/pull/2140
- Use GPTQ-Marlin for supported GPTQ configurations by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2111
- fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' by @icyxp in https://github.com/huggingface/text-generation-inference/pull/2123
- refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2132
- fix: prefer serde structs over custom functions by @drbh in https://github.com/huggingface/text-generation-inference/pull/2127
- Fixing test. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2152
- GH router. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2153
- Fixing baichuan override. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2158
- [Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1940
- Fixing graph capture for flash decoding. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2163
- fix FlashDecoding change's regression in intel platform by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2161
- fix: use the base layers weight in mistral rocm by @drbh in https://github.com/huggingface/text-generation-inference/pull/2155
- Fixing rocm. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2164
- Ci test by @glegendre01 in https://github.com/huggingface/text-generation-inference/pull/2124
- Hotfixing qwen2 and starcoder2 (which also get clamping). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2167
- feat: improve update_docs for openapi schema by @drbh in https://github.com/huggingface/text-generation-inference/pull/2169
- Fixing the dockerfile warnings. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2173
- Fixing missing
object
field for regular completions. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2175
New Contributors
- @icyxp made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2123
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.1.0...v2.1.1