v2.3.1
版本发布时间: 2024-10-03 21:01:49
huggingface/text-generation-inference最新发布版本:v2.3.1(2024-10-03 21:01:49)
Important changes
- Added support for Mllama (3.2, vision models). Flashed, unpadded.
- FP8 performance improvements
- Moe performance improvements
- BREAKING CHANGE - When using tools, models could answer with a tool call
notify_error
with the content error, it will instead output regular generation.
What's Changed
- nix: remove unused
_server.nix
file by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2538 - chore: Add old V2 backend by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2551
- Remove duplicated
RUN
inDockerfile
by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2547 - Micro cleanup. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2555
- Hotfixing main by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2556
- Add support for scalar FP8 weight scales by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2550
- Add
DenseMoELayer
and wire it up in Mixtral/Deepseek V2 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2537 - Update the link to the Ratatui organization by @orhun in https://github.com/huggingface/text-generation-inference/pull/2546
- Simplify crossterm imports by @orhun in https://github.com/huggingface/text-generation-inference/pull/2545
- Adding note for private models in quick-tour document by @ariG23498 in https://github.com/huggingface/text-generation-inference/pull/2548
- Hotfixing main. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2562
- Cleanup Vertex + Chat by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2553
- More tensor cores. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2558
- remove LORA_ADAPTERS_PATH by @nbroad1881 in https://github.com/huggingface/text-generation-inference/pull/2563
- Add LoRA adapters support for Gemma2 by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2567
- Fix build with
--features google
by @alvarobartt in https://github.com/huggingface/text-generation-inference/pull/2566 - Improve support for GPUs with capability < 8 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2575
- flashinfer: pass window size and dtype by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2574
- Remove compute capability lazy cell by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2580
- Update architecture.md by @ulhaqi12 in https://github.com/huggingface/text-generation-inference/pull/2577
- Update ROCM libs and improvements by @mht-sharma in https://github.com/huggingface/text-generation-inference/pull/2579
- Add support for GPTQ-quantized MoE models using MoE Marlin by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2557
- feat: support phi3.5 moe by @drbh in https://github.com/huggingface/text-generation-inference/pull/2479
- Move flake back to tgi-nix
main
by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2586 - MoE Marlin: support
desc_act
forgroupsize != -1
by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2590 - nix: experimental support for building a Docker container by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2470
- Mllama flash version by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2585
- Max token capacity metric by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2595
- CI (2592): Allow LoRA adapter revision in server launcher by @drbh in https://github.com/huggingface/text-generation-inference/pull/2602
- Unroll notify error into generate response by @drbh in https://github.com/huggingface/text-generation-inference/pull/2597
- New release 2.3.1 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2604
New Contributors
- @alvarobartt made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2547
- @orhun made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2546
- @ariG23498 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2548
- @ulhaqi12 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2577
- @mht-sharma made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2579
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.3.0...v2.3.1