v2.0.0
版本发布时间: 2024-04-13 00:44:16
huggingface/text-generation-inference最新发布版本:v2.0.1(2024-04-18 23:22:51)
TGI is back to Apache 2.0!
Highlights
- License was reverted to Apache 2.0
- Cuda graphs are now used by default. They improve latency substancially on high end nodes.
- Llava-next was added. It is the second multimodal model available on TGI after Idefics.
- Cohere Command R+ support. TGI is the fastest open source backend for Command R+
- FP8 support.
- We now share the vocabulary for all medusa heads, greatly improving latency and memory use.
Try out Command R+ with Medusa heads on 4xA100s with:
model=text-generation-inference/commandrplus-medusa
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 4
What's Changed
- Add cuda graphs sizes and make it default. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1703
- Pickle conversion now requires
--trust-remote-code
. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1704 - Push users to streaming in the readme. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1698
- Fixing cohere tokenizer. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1697
- Force weights_only (before fully breaking pickle files anyway). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1710
- Regenerate ld.so.cache by @oOraph in https://github.com/huggingface/text-generation-inference/pull/1708
- Revert license to Apache 2.0 by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1714
- Automatic quantization config. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1719
- Adding Llava-Next (Llava 1.6) with full support. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1709
- fix: fix CohereForAI/c4ai-command-r-plus by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1707
- Update libraries by @abhishekkrthakur in https://github.com/huggingface/text-generation-inference/pull/1713
- Dev/mask ldconfig output v2 by @oOraph in https://github.com/huggingface/text-generation-inference/pull/1716
- Fp8 Support by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1726
- Upgrade EETQ (Fixes the cuda graphs). by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1729
- fix(router): fix a possible deadlock in next_batch by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1731
- chore(cargo-toml): apply lto fat and codegen-units of one by @somehowchris in https://github.com/huggingface/text-generation-inference/pull/1651
- Improve the defaults for the launcher by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1727
- feat: medusa shared by @OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1734
- Fix typo in guidance.md by @eltociear in https://github.com/huggingface/text-generation-inference/pull/1735
New Contributors
- @somehowchris made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1651
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v1.4.5...v2.0.0