v3.0.0

huggingface/text-generation-inference

版本发布时间: 2024-12-10 04:22:42

huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)

TL;DR

Big new release

benchmarks_v3

Details: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

What's Changed

feat: concat the adapter id to the model id in chat response by @drbh in https://github.com/huggingface/text-generation-inference/pull/2779
Move JSON grammar -> regex grammar conversion to the router by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2772
Use FP8 KV cache when specified by compressed-tensors by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2761
upgrade ipex cpu to fix coredump in tiiuae/falcon-7b-instruct (pageat… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2778
Fix: docs typo by @jp1924 in https://github.com/huggingface/text-generation-inference/pull/2777
Support continue final message by @drbh in https://github.com/huggingface/text-generation-inference/pull/2733
Fix doc. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2792
Removing ../ that broke the link by @Getty in https://github.com/huggingface/text-generation-inference/pull/2789
fix: add merge-lora arg for model id by @drbh in https://github.com/huggingface/text-generation-inference/pull/2788
fix: only use eos_token_id as pad_token_id if int by @dvrogozh in https://github.com/huggingface/text-generation-inference/pull/2774
Sync (most) server dependencies with Nix by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2782
Saving some VRAM. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2790
fix: avoid setting use_sgmv if no kernels present by @drbh in https://github.com/huggingface/text-generation-inference/pull/2796
use oneapi 2024 docker image directly for xpu by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2793
feat: auto max_new_tokens by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2803
Auto max prefill by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2797
Adding A100 compute. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2806
Enable paligemma2 by @drbh in https://github.com/huggingface/text-generation-inference/pull/2807
Attempt for cleverer auto batch_prefill values (some simplifications). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2808
V3 doc by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2809
Prep new version by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2810
Hotfixing the link. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2811

New Contributors

@jp1924 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2777
@Getty made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2789

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.4.1...v3.0.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-12-10发行的版本