v1.1.4

版本发布时间: 2023-04-21 10:26:19

bigscience-workshop/petals最新发布版本:v2.2.0(2023-09-07 01:29:56)

Highlights

🗝️ 8-bit servers support more GPUs. A bitsandbytes update brings 8-bit support to older generations of NVIDIA GPUs, as well as the GeForce 16 GPU series (e.g. 1660 Ti). Please try Petals 1.1.4 if you previously had errors like Your GPU does not support Int8 Matmul! and cublasLt ran into an error! on some GPUs. This version also loads weights in 8-bit by default when tensor parallelism is enabled.

⏱️ Servers start faster. Servers take ~2x less time to load block weights from the disk cache to the GPU memory. The next release will also reduce the time it takes to download the weights from the Internet, since they will be downloaded in 8-bit instead of 16-bit.

🧵 Multi-threaded clients work faster. Earlier, multi-threaded clients were actually performing only one network request at a time due to a bug in hivemind. This bug was recently fixed in hivemind. This significantly improves the speed of the chat.petals.ml app when multiple users chat concurrently.

⏱️ Clients start faster. Clients take ~10% less time to load the model, since they build a route through remote servers in parallel with loading the local part of the model (input/output embeddings).

🌳 Relaxed dependency requirements. We relaxed version requirements for transformers and other huggingface libraries, so you can update them independently of Petals. In particular, Petals works with PyTorch 2.0 and the latest transformers release. Also, we fixed a bug where the client loaded a model in float32 by default (instead of bfloat16/float16) in some transformers releases. Please try Petals 1.1.4 if you previously had out-of-memory errors when running the client.

What's Changed

Speed up loading blocks using init with meta weights by @mryab in https://github.com/bigscience-workshop/petals/pull/285
Add benchmarks to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/284
Fix invalid author email in setup.cfg by @borzunov in https://github.com/bigscience-workshop/petals/pull/287
Hotfix: Increase daemon_startup_timeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/292
Update bitsandbytes, hivemind, transformers by @justheuristic in https://github.com/bigscience-workshop/petals/pull/290
Fix deps, enable 8-bit by default for TP by @borzunov in https://github.com/bigscience-workshop/petals/pull/298
Add Python 3.10 to CI by @borzunov in https://github.com/bigscience-workshop/petals/pull/299
Remove CustomLinear8bitLt by @borzunov in https://github.com/bigscience-workshop/petals/pull/297
Remove use_auto_relay=True in client by @borzunov in https://github.com/bigscience-workshop/petals/pull/300
Start SequenceManager's thread only after first .make_sequence() by @borzunov in https://github.com/bigscience-workshop/petals/pull/301
Require bitsandbytes == 0.38.0.post2, hivemind == 1.1.7 by @borzunov in https://github.com/bigscience-workshop/petals/pull/302
Suggest commands for Docker first by @borzunov in https://github.com/bigscience-workshop/petals/pull/304
Relax the rest of Hugging Face dependencies by @borzunov in https://github.com/bigscience-workshop/petals/pull/305
Force transformers to use config.torch_dtype by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/307
Bump version to 1.1.4 by @borzunov in https://github.com/bigscience-workshop/petals/pull/306

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.3...v1.1.4

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-04-21发行的版本