v1.8
版本发布时间: 2024-06-27 10:38:24
oobabooga/text-generation-webui最新发布版本:v1.14(2024-08-20 12:29:43)
Releases with version numbers are back! The last one was v1.7 in October 8th, 2023, so I am calling this one v1.8.
From this release on, it will be possible to install past releases by downloading the .zip
source and running the start_
script in it. The installation script no longer updates to the latest version automatically. This doesn't apply to snapshots/releases before this one.
New backend
-
Add TensorRT-LLM support.
- That's now the fastest backend in the project.
- It currently has to be installed in a separate Python 3.10 environment.
- A Dockerfile is provided.
- For instructions on how to convert models, consult #5715 and https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md.
UI updates
- Improved "past chats" menu: this menu is now a vertical list of text items instead of a dropdown menu, making it a lot easier to switch between past conversations. Only one click is required instead of two.
- Store the chat history in the browser: if you restart the server and do not refresh the browser, your conversation will not be accidentally erased anymore.
- Avoid some unnecessary calls to the backend, making the UI faster and more responsive.
- Move the "Character" droprown menu to the main Chat tab, to make it faster to switch between different characters.
- Change limits of RoPE scaling sliders in UI (#6142). Thanks @GodEmperor785.
- Do not expose "alpha_value" for llama.cpp and "rope_freq_base" for transformers to keep things simple and avoid conversions.
- Remove an obsolete info message intended for GPTQ-for-LLaMa.
- Remove the "Tab" shortcut to switch between the generation tabs and the "Parameter" tabs, as it was awkward.
- Improved streaming of lists, which would flicker and temporarily display horizontal lines sometimes.
Bug fixes
- Revert the reentrant generation lock to a simple lock, fixing an issue caused by the change.
- Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119). Thanks @Ph0rk0z.
- Fix "500 error" issue caused by
block_requests.py
(#5976). Thanks @nero-dv. - Setting default alpha_value and fixing loading some newer DeepSeekCoder GGUFs (#6111). Thanks @mefich.
Library updates
- llama-cpp-python: bump to 0.2.79 (after a month of wrestling with GitHub Actions).
- ExLlamaV2: bump to 0.1.6.
- flash-attention: bump to 2.5.9.post1.
- PyTorch: bump to 2.2.2. That's the last 2.2 patch version.
- HQQ: bump to 0.1.7.post3. Makes HQQ functional again.
Other updates
- Do not "git pull" during installation, allowing previous releases (from this one on) to be installed.
- Make logs more readable, no more \u7f16\u7801 (#6127). Thanks @Touch-Night.
Support this project
- Become a GitHub Sponsor ❤️
- Buy me a ko-fi ☕