v1.4.0
版本发布时间: 2024-01-27 02:07:45
huggingface/text-generation-inference最新发布版本:v2.0.1(2024-04-18 23:22:51)
Highlights
- OpenAI compatible API #1427
- exllama v2 Tensor Parallel #1490
- GPTQ support for AMD GPUs #1489
- Phi support #1442
What's Changed
- fix: fix local loading for .bin models by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1419
- Fix missing make target platform for local install: 'install-flash-attention-v2' by @deepily in https://github.com/huggingface/text-generation-inference/pull/1414
- fix: follow base model for tokenizer in router by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1424
- Fix local load for Medusa by @PYNing in https://github.com/huggingface/text-generation-inference/pull/1420
- Return prompt vs generated tokens. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1436
- feat: supports openai chat completions API by @drbh in https://github.com/huggingface/text-generation-inference/pull/1427
- feat: support raise_exception, bos and eos tokens by @drbh in https://github.com/huggingface/text-generation-inference/pull/1450
- chore: bump rust version and annotate/fix all clippy warnings by @drbh in https://github.com/huggingface/text-generation-inference/pull/1455
- feat: conditionally toggle chat on invocations route by @drbh in https://github.com/huggingface/text-generation-inference/pull/1454
- Disable
decoder_input_details
on OpenAI-compatible chat streaming, pass temp and top-k from API by @EndlessReform in https://github.com/huggingface/text-generation-inference/pull/1470 - Fixing non divisible embeddings. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1476
- Add messages api compatibility docs by @drbh in https://github.com/huggingface/text-generation-inference/pull/1478
- Add a new
/tokenize
route to get the tokenized input by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1471 - feat: adds phi model by @drbh in https://github.com/huggingface/text-generation-inference/pull/1442
- fix: read stderr in download by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1486
- fix: show warning with tokenizer config parsing error by @drbh in https://github.com/huggingface/text-generation-inference/pull/1488
- fix: launcher doc typos by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1473
- Reinstate exl2 with tp by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1490
- Add sealion mpt support by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1477
- Trying to fix that flaky test. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1491
- fix: launcher doc typos by @thelinuxkid in https://github.com/huggingface/text-generation-inference/pull/1462
- Update the docs to include newer models. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1492
- GPTQ support on ROCm by @fxmarty in https://github.com/huggingface/text-generation-inference/pull/1489
- feat: add tokenizer-config-path to launcher args by @drbh in https://github.com/huggingface/text-generation-inference/pull/1495
New Contributors
- @deepily made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1414
- @PYNing made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1420
- @drbh made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1427
- @EndlessReform made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1470
- @thelinuxkid made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1462
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v1.3.4...v1.4.0