v2.1.1

huggingface/text-generation-inference

版本发布时间: 2024-07-04 18:43:49

huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)

Main changes

Bugfixes
Added FlashDecoding support (Beta) use FLASH_DECODING=1 to use TGI with flash decoding (large speedups on long queries). https://github.com/huggingface/text-generation-inference/pull/1940
Use Marlin over GPTQ kernels for faster GPTQ inference https://github.com/huggingface/text-generation-inference/pull/2111

What's Changed

Fixing the CI to also run in release when it's a tag ? by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2138
fix microsoft/Phi-3-mini-4k-instruct crash in batch.slots[batch.slot_… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2148
Fixing clippy. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2149
fix: use weights from base_layer by @drbh in https://github.com/huggingface/text-generation-inference/pull/2141
feat: download lora adapter weights from launcher by @drbh in https://github.com/huggingface/text-generation-inference/pull/2140
Use GPTQ-Marlin for supported GPTQ configurations by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2111
fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' by @icyxp in https://github.com/huggingface/text-generation-inference/pull/2123
refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2132
fix: prefer serde structs over custom functions by @drbh in https://github.com/huggingface/text-generation-inference/pull/2127
Fixing test. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2152
GH router. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2153
Fixing baichuan override. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2158
[Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1940
Fixing graph capture for flash decoding. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2163
fix FlashDecoding change's regression in intel platform by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2161
fix: use the base layers weight in mistral rocm by @drbh in https://github.com/huggingface/text-generation-inference/pull/2155
Fixing rocm. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2164
Ci test by @glegendre01 in https://github.com/huggingface/text-generation-inference/pull/2124
Hotfixing qwen2 and starcoder2 (which also get clamping). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2167
feat: improve update_docs for openapi schema by @drbh in https://github.com/huggingface/text-generation-inference/pull/2169
Fixing the dockerfile warnings. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2173
Fixing missing object field for regular completions. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2175

New Contributors

@icyxp made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2123

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.1.0...v2.1.1

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-07-04发行的版本