v1.1.4
版本发布时间: 2023-04-21 10:26:19
bigscience-workshop/petals最新发布版本:v2.2.0(2023-09-07 01:29:56)
Highlights
🗝️ 8-bit servers support more GPUs. A bitsandbytes update brings 8-bit support to older generations of NVIDIA GPUs, as well as the GeForce 16 GPU series (e.g. 1660 Ti). Please try Petals 1.1.4 if you previously had errors like Your GPU does not support Int8 Matmul!
and cublasLt ran into an error!
on some GPUs. This version also loads weights in 8-bit by default when tensor parallelism is enabled.
⏱️ Servers start faster. Servers take ~2x less time to load block weights from the disk cache to the GPU memory. The next release will also reduce the time it takes to download the weights from the Internet, since they will be downloaded in 8-bit instead of 16-bit.
🧵 Multi-threaded clients work faster. Earlier, multi-threaded clients were actually performing only one network request at a time due to a bug in hivemind. This bug was recently fixed in hivemind. This significantly improves the speed of the chat.petals.ml app when multiple users chat concurrently.
⏱️ Clients start faster. Clients take ~10% less time to load the model, since they build a route through remote servers in parallel with loading the local part of the model (input/output embeddings).
🌳 Relaxed dependency requirements. We relaxed version requirements for transformers and other huggingface libraries, so you can update them independently of Petals. In particular, Petals works with PyTorch 2.0 and the latest transformers
release. Also, we fixed a bug where the client loaded a model in float32 by default (instead of bfloat16/float16) in some transformers
releases. Please try Petals 1.1.4 if you previously had out-of-memory errors when running the client.
What's Changed
- Speed up loading blocks using init with meta weights by @mryab in https://github.com/bigscience-workshop/petals/pull/285
- Add benchmarks to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/284
- Fix invalid author email in setup.cfg by @borzunov in https://github.com/bigscience-workshop/petals/pull/287
- Hotfix: Increase daemon_startup_timeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/292
- Update bitsandbytes, hivemind, transformers by @justheuristic in https://github.com/bigscience-workshop/petals/pull/290
- Fix deps, enable 8-bit by default for TP by @borzunov in https://github.com/bigscience-workshop/petals/pull/298
- Add Python 3.10 to CI by @borzunov in https://github.com/bigscience-workshop/petals/pull/299
- Remove CustomLinear8bitLt by @borzunov in https://github.com/bigscience-workshop/petals/pull/297
- Remove use_auto_relay=True in client by @borzunov in https://github.com/bigscience-workshop/petals/pull/300
- Start SequenceManager's thread only after first .make_sequence() by @borzunov in https://github.com/bigscience-workshop/petals/pull/301
- Require bitsandbytes == 0.38.0.post2, hivemind == 1.1.7 by @borzunov in https://github.com/bigscience-workshop/petals/pull/302
- Suggest commands for Docker first by @borzunov in https://github.com/bigscience-workshop/petals/pull/304
- Relax the rest of Hugging Face dependencies by @borzunov in https://github.com/bigscience-workshop/petals/pull/305
- Force transformers to use config.torch_dtype by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/307
- Bump version to 1.1.4 by @borzunov in https://github.com/bigscience-workshop/petals/pull/306
Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.3...v1.1.4