v2.1.0
版本发布时间: 2024-06-28 14:26:01
huggingface/text-generation-inference最新发布版本:v3.0.1(2024-12-12 04:13:58)
Notable changes
-
New models : gemma2
-
Multi lora adapters. You can now run multiple loras on the same TGI deployment https://github.com/huggingface/text-generation-inference/pull/2010
-
Faster GPTQ inference and Marlin support (up to 2x speedup).
-
Reworked the entire scheduling logic (better block allocations, and allowing further speedups in new releases)
-
Lots of Rocm support and bugfixes,
-
Lots of new contributors ! Thanks a lot for these contributions
What's Changed
- OpenAI function calling compatible support by @phangiabao98 in https://github.com/huggingface/text-generation-inference/pull/1888
- Fixing types. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1906
- Types. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1909
- Fixing signals. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1910
- Removing some unused code. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1915
- MI300 compatibility by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1764
- Add TGI monitoring guide through Grafana and Prometheus by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1908
- Update grafana template by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1918
- Fix TunableOp bug by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1920
- Fix TGI issues with ROCm by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1921
- Fixing the download strategy for ibm-fms by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1917
- ROCm: make CK FA2 default instead of Triton by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1924
- docs: Fix grafana dashboard url by @edwardzjl in https://github.com/huggingface/text-generation-inference/pull/1925
- feat: include token in client test like server tests by @drbh in https://github.com/huggingface/text-generation-inference/pull/1932
- Creating doc automatically for supported models. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1929
- fix: use path inside of speculator config by @drbh in https://github.com/huggingface/text-generation-inference/pull/1935
- feat: add train medusa head tutorial by @drbh in https://github.com/huggingface/text-generation-inference/pull/1934
- reenable xpu for tgi by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/1939
- Fixing some legacy behavior (big swapout of serverless on legacy stuff). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1937
- Add completion route to client and add stop parameter where it's missing by @thomas-schillaci in https://github.com/huggingface/text-generation-inference/pull/1869
- Improving the logging system. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1938
- Fixing codellama loads by using purely
AutoTokenizer
. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1947 - Fix seeded output. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1949
- Fix (flash) Gemma prefix and enable tests by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1950
- Fix GPTQ for models which do not have float16 at the default dtype (simpler) by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1953
- Processor config chat template by @drbh in https://github.com/huggingface/text-generation-inference/pull/1954
- fix small typo and broken link by @MoritzLaurer in https://github.com/huggingface/text-generation-inference/pull/1958
- Upgrade to Axum 0.7 and Hyper 1.0 (Breaking change: disabled ngrok tunneling). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1959
- Fix (non-container) pytest stdout buffering-related lock-up by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1963
- Fixing the text part from tokenizer endpoint. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1967
- feat: adjust attn weight loading logic by @drbh in https://github.com/huggingface/text-generation-inference/pull/1975
- Add support for exl2-quantized models by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1965
- Update documentation version to 2.0.4 by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1980
- Purely refactors paged/attention into
layers/attention
and make hardware differences more obvious with 1 file per hardware. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1986 - Fixing exl2 scratch buffer. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1990
- single char ` addition for docs by @nbroad1881 in https://github.com/huggingface/text-generation-inference/pull/1989
- Fixing GPTQ imports. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1994
- reable xpu, broken by gptq and setuptool upgrade by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/1988
- router: send the input as chunks to the backend by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1981
- Fix Phi-2 with
tp>1
by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2003 - fix: update triton implementation reference by @emmanuel-ferdman in https://github.com/huggingface/text-generation-inference/pull/2002
- feat: add SchedulerV3 by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1996
- Support GPTQ models with column-packed up/gate tensor by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2006
- Making
make install
work better by default. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2004 - Hotfixing
make install
. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2008 - Do not initialize scratch space when there are no ExLlamaV2 layers by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2015
- feat: move allocation logic to rust by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1835
- Fixing rocm. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2021
- Fix GPTQWeight import by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2020
- Update version on init.py to 0.7.0 by @andimarafioti in https://github.com/huggingface/text-generation-inference/pull/2017
- Add support for Marlin-quantized models by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2014
- marlin: support tp>1 when group_size==-1 by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2032
- marlin: improve build by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2031
- Internal runner ? by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2023
- Xpu gqa by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2013
- server: use chunked inputs by @danieldk in https://github.com/huggingface/text-generation-inference/pull/1985
- ROCm and sliding windows fixes by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/2033
- Add Phi-3 medium support by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2039
- feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/text-generation-inference/pull/2038
- fix(ci): remove unnecessary permissions by @McPatate in https://github.com/huggingface/text-generation-inference/pull/2045
- Update LLMM1 bound by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/2050
- Support chat response format by @drbh in https://github.com/huggingface/text-generation-inference/pull/2046
- fix(server): fix OPT implementation by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2061
- fix(layers): fix SuRotaryEmbedding by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/2060
- PR #2049 CI run by @drbh in https://github.com/huggingface/text-generation-inference/pull/2054
- implement Open Inference Protocol endpoints by @drbh in https://github.com/huggingface/text-generation-inference/pull/1942
- Add support for GPTQ Marlin by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2052
- Update the link for qwen2 by @xianbaoqian in https://github.com/huggingface/text-generation-inference/pull/2068
- Adding architecture document by @tengomucho in https://github.com/huggingface/text-generation-inference/pull/2044
- Support different image sizes in prefill in VLMs by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2065
- Contributing guide & Code of Conduct by @LysandreJik in https://github.com/huggingface/text-generation-inference/pull/2074
- fix build.rs watch files by @zirconium-n in https://github.com/huggingface/text-generation-inference/pull/2072
- Set maximum grpc message receive size to 2GiB by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2075
- CI: Tailscale improvements by @glegendre01 in https://github.com/huggingface/text-generation-inference/pull/2079
- CI: pass pre-commit hooks again by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2084
- feat: rotate tests ci token by @drbh in https://github.com/huggingface/text-generation-inference/pull/2091
- Support exl2-quantized Qwen2 models by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2085
- Factor out sharding of packed tensors by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2059
- Fix
text-generation-server quantize
by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2103 - feat: sort cuda graphs in descending order by @drbh in https://github.com/huggingface/text-generation-inference/pull/2104
- New runner. Manual squash. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2110
- Fix cargo-chef prepare by @ur4t in https://github.com/huggingface/text-generation-inference/pull/2101
- Support
HF_TOKEN
environment variable by @Wauplin in https://github.com/huggingface/text-generation-inference/pull/2066 - Add OTLP Service Name Environment Variable by @KevinDuffy94 in https://github.com/huggingface/text-generation-inference/pull/2076
- corrected Pydantic warning. by @yukiman76 in https://github.com/huggingface/text-generation-inference/pull/2095
- use xpu-smi to dump used memory by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2047
- fix ChatCompletion and ChatCompletionChunk object string not compatible with standard openai api by @sunxichen in https://github.com/huggingface/text-generation-inference/pull/2089
- Cpu tgi by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/1936
- feat: add simple tests for weights by @drbh in https://github.com/huggingface/text-generation-inference/pull/2092
- Removing IPEX_AVAIL. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2115
- fix cpu and xpu issue by @sywangyi in https://github.com/huggingface/text-generation-inference/pull/2116
- Add pytest release marker by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2114
- Fix CI . by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2118
- Enable multiple LoRa adapters by @drbh in https://github.com/huggingface/text-generation-inference/pull/2010
- Support AWQ quantization with bias by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2117
- Add support for Marlin 2:4 sparsity by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2102
- fix: simplify kserve endpoint and fix imports by @drbh in https://github.com/huggingface/text-generation-inference/pull/2119
- Fixing prom leak by upgrading. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2129
- Bumping to 2.1 by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2131
- Idefics2: sync added image tokens with transformers by @danieldk in https://github.com/huggingface/text-generation-inference/pull/2080
- Fixing malformed rust tokenizers by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2134
- Fixing gemma2. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/2135
- fix: refactor post_processor logic and add test by @drbh in https://github.com/huggingface/text-generation-inference/pull/2137
New Contributors
- @phangiabao98 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1888
- @edwardzjl made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1925
- @thomas-schillaci made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1869
- @nbroad1881 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1989
- @emmanuel-ferdman made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2002
- @andimarafioti made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2017
- @McPatate made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2038
- @xianbaoqian made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2068
- @tengomucho made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2044
- @LysandreJik made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2074
- @zirconium-n made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2072
- @glegendre01 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2079
- @ur4t made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2101
- @KevinDuffy94 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2076
- @yukiman76 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2095
- @sunxichen made their first contribution in https://github.com/huggingface/text-generation-inference/pull/2089
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v2.0.3...v2.1.0