v3.0.0
版本发布时间: 2024-12-10 04:22:42
huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)
TL;DR
Big new release
Details: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
What's Changed
- feat: concat the adapter id to the model id in chat response by @drbh in https://github.com/huggingface/text-generation-inference/pull/2779
- Move JSON grammar -> regex grammar conversion to the router by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2772
- Use FP8 KV cache when specified by compressed-tensors by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2761
- upgrade ipex cpu to fix coredump in tiiuae/falcon-7b-instruct (pageat… by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2778
- Fix: docs typo by @jp1924 in https://github.com/huggingface/text-generation-inference/pull/2777
- Support continue final message by @drbh in https://github.com/huggingface/text-generation-inference/pull/2733
- Fix doc. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2792
- Removing ../ that broke the link by @Getty in https://github.com/huggingface/text-generation-inference/pull/2789
- fix: add merge-lora arg for model id by @drbh in https://github.com/huggingface/text-generation-inference/pull/2788
- fix: only use eos_token_id as pad_token_id if int by @dvrogozh in https://github.com/huggingface/text-generation-inference/pull/2774
- Sync (most) server dependencies with Nix by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2782
- Saving some VRAM. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2790
- fix: avoid setting use_sgmv if no kernels present by @drbh in https://github.com/huggingface/text-generation-inference/pull/2796
- use oneapi 2024 docker image directly for xpu by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2793
- feat: auto max_new_tokens by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2803
- Auto max prefill by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2797
- Adding A100 compute. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2806
- Enable paligemma2 by @drbh in https://github.com/huggingface/text-generation-inference/pull/2807
- Attempt for cleverer auto batch_prefill values (some simplifications). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2808
- V3 doc by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2809
- Prep new version by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2810
- Hotfixing the link. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2811
New Contributors
- @jp1924 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2777
- @Getty made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2789
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.4.1...v3.0.0