v2.2.0

huggingface/text-generation-inference

版本发布时间: 2024-07-24 00:30:03

huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)

Notable changes

Llama 3.1 support (including 405B, FP8 support in a lot of mixed configurations, FP8, AWQ, GPTQ, FP8+FP16).
Gemma2 softcap support
Deepseek v2 support.
Lots of internal reworks/cleanup (allowing for cool features)
Lots of AWQ/GPTQ work with marlin kernels (everything should be faster by default)
Flash decoding support (FLASH_DECODING=1 environment variables which will probably enable some nice improvements in the future)

What's Changed

Preparing patch release. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2186
Adding "longrope" for Phi-3 (#2172) by @amihalik in https://github.com/huggingface/text-generation-inference/pull/2179
Refactor dead code - Removing all flash_xxx.py files. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2166
Fix Starcoder2 after refactor by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2189
GPTQ CI improvements by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2151
Consistently take prefix in model constructors by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2191
fix dbrx & opt model prefix bug by @icyxp in https://github.com/huggingface/text-generation-inference/pull/2201
hotfix: Fix number of KV heads by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2202
Fix incorrect cache allocation with multi-query by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2203
Falcon/DBRX: get correct number of key-value heads by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2205
add doc for intel gpus by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2181
fix: python deserialization by @jaluma in https://github.com/huggingface/text-generation-inference/pull/2178
update to metrics 0.23.0 or could work with metrics-exporter-promethe… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2190
feat: use model name as adapter id in chat endpoints by @drbh in https://github.com/huggingface/text-generation-inference/pull/2128
Fix nccl regression on PyTorch 2.3 upgrade by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/2099
Fix buildx cache + change runner type by @glegendre01 in https://github.com/huggingface/text-generation-inference/pull/2176
Fixed README ToC by @vinkamath in https://github.com/huggingface/text-generation-inference/pull/2196
Updating the self check by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2209
Move quantized weight handling out of the Weights class by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2194
Add support for FP8 on compute capability >=8.0, <8.9 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2213
fix: append DONE message to chat stream by @drbh in https://github.com/huggingface/text-generation-inference/pull/2221
[fix] Modifying base in yarn embedding by @SeongBeomLEE in https://github.com/huggingface/text-generation-inference/pull/2212
Use symmetric quantization in the quantize subcommand by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2120
feat: simple mistral lora integration tests by @drbh in https://github.com/huggingface/text-generation-inference/pull/2180
fix custom cache dir by @ErikKaum in https://github.com/huggingface/text-generation-inference/pull/2226
fix: Remove bitsandbytes installation when running cpu-only install by @Hugoch in https://github.com/huggingface/text-generation-inference/pull/2216
Add support for AWQ-quantized Idefics2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2233
server quantize: expose groupsize option by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2225
Remove stray quantize argument in get_weights_col_packed_qkv by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2237
fix(server): fix cohere by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2249
Improve the handling of quantized weights by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2250
Hotfix: fix of use of unquantized weights in Gemma GQA loading by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2255
Hotfix: various GPT-based model fixes by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2256
Hotfix: fix MPT after recent refactor by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2257
Hotfix: pass through model revision in VlmCausalLM by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2258
usage stats and crash reports by @ErikKaum in https://github.com/huggingface/text-generation-inference/pull/2220
add usage stats to toctree by @ErikKaum in https://github.com/huggingface/text-generation-inference/pull/2260
fix: adjust default tool choice by @drbh in https://github.com/huggingface/text-generation-inference/pull/2244
Add support for Deepseek V2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2224
re-push to internal registry by @XciD in https://github.com/huggingface/text-generation-inference/pull/2242
Add FP8 release test by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2261
feat(fp8): use fbgemm kernels and load fp8 weights directly by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2248
fix(server): fix deepseekv2 loading by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2266
Hotfix: fix of use of unquantized weights in Mixtral GQA loading by @icyxp in https://github.com/huggingface/text-generation-inference/pull/2269
legacy warning on text_generation client by @ErikKaum in https://github.com/huggingface/text-generation-inference/pull/2271
fix(ci): test new instances by @XciD in https://github.com/huggingface/text-generation-inference/pull/2272
fix(server): fix fp8 weight loading by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2268
Softcapping for gemma2. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2273
use proper name for ci by @XciD in https://github.com/huggingface/text-generation-inference/pull/2274
Fixing mistral nemo. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2276
fix(l4): fix fp8 logic on l4 by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2277
Add support for repacking AWQ weights for GPTQ-Marlin by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2278
[WIP] Add support for Mistral-Nemo by supporting head_dim through config by @shaltielshmid in https://github.com/huggingface/text-generation-inference/pull/2254
Preparing for release. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2285
Add support for Llama 3 rotary embeddings by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2286
hotfix: pin numpy by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2289

New Contributors

@jaluma made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2178
@vinkamath made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2196
@ErikKaum made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2226
@Hugoch made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2216
@XciD made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2242
@shaltielshmid made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2254

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.1.1...v2.2.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-07-24发行的版本