v2.3.1

huggingface/text-generation-inference

版本发布时间: 2024-10-03 21:01:49

huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)

Important changes

Added support for Mllama (3.2, vision models). Flashed, unpadded.
FP8 performance improvements
Moe performance improvements
BREAKING CHANGE - When using tools, models could answer with a tool call notify_error with the content error, it will instead output regular generation.

What's Changed

nix: remove unused _server.nix file by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2538
chore: Add old V2 backend by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2551
Remove duplicated RUN in Dockerfile by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2547
Micro cleanup. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2555
Hotfixing main by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2556
Add support for scalar FP8 weight scales by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2550
Add DenseMoELayer and wire it up in Mixtral/Deepseek V2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2537
Update the link to the Ratatui organization by @orhun in https://github.com/huggingface/text-generation-inference/pull/2546
Simplify crossterm imports by @orhun in https://github.com/huggingface/text-generation-inference/pull/2545
Adding note for private models in quick-tour document by @ariG23498 in https://github.com/huggingface/text-generation-inference/pull/2548
Hotfixing main. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2562
Cleanup Vertex + Chat by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2553
More tensor cores. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2558
remove LORA_ADAPTERS_PATH by @nbroad1881 in https://github.com/huggingface/text-generation-inference/pull/2563
Add LoRA adapters support for Gemma2 by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2567
Fix build with --features google by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2566
Improve support for GPUs with capability < 8 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2575
flashinfer: pass window size and dtype by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2574
Remove compute capability lazy cell by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2580
Update architecture.md by @ulhaqi12 in https://github.com/huggingface/text-generation-inference/pull/2577
Update ROCM libs and improvements by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2579
Add support for GPTQ-quantized MoE models using MoE Marlin by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2557
feat: support phi3.5 moe by @drbh in https://github.com/huggingface/text-generation-inference/pull/2479
Move flake back to tgi-nix main by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2586
MoE Marlin: support desc_act for groupsize != -1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2590
nix: experimental support for building a Docker container by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2470
Mllama flash version by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2585
Max token capacity metric by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2595
CI (2592): Allow LoRA adapter revision in server launcher by @drbh in https://github.com/huggingface/text-generation-inference/pull/2602
Unroll notify error into generate response by @drbh in https://github.com/huggingface/text-generation-inference/pull/2597
New release 2.3.1 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2604

New Contributors

@alvarobartt made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2547
@orhun made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2546
@ariG23498 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2548
@ulhaqi12 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2577
@mht-sharma made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2579

Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.3.1

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-10-03发行的版本