v2.2.0

版本发布时间: 2023-09-07 01:29:56

bigscience-workshop/petals最新发布版本:v2.2.0(2023-09-07 01:29:56)

Highlights

🦅 Falcon support. Petals now supports all models based on Falcon, including Falcon 180B released today. We improved the 🤗 Transformers FalconModel implementation to be up to 40% faster on recent GPUs. Our chatbot app runs Falcon 180B-Chat at ~2 tokens/sec.

Falcon-40B is licensed under Apache 2.0, so you can load it by specifying tiiuae/falcon-40b or tiiuae/falcon-40b-instruct as the model name. Falcon-180B is licensed under a custom license, and it is not clear if we can provide a Python interface for inference and fine-tuning of this model. Right now, it is only available in the chatbot app, and we are waiting for further clarifications from TII on this issue.

🍏 Native macOS support. You can run Petals clients and servers on macOS natively - just install Homebrew and run these commands:

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server petals-team/StableBeluga2

If your computer has Apple M1/M2 chip, the Petals server will use the integrated GPU automatically. We recommend to only host Llama-based models, since other supported architectures do not work efficiently on M1/M2 chips yet. We also recommend using Python 3.10+ on macOS (installed by Homebrew automatically).

🔌 Serving custom models. Custom models now automatically show up at https://health.petals.dev as "not officially supported" models. As a reminder, you are not limited to models available at https://health.petals.dev and can run a server hosting any model based on BLOOM, Llama, or Falcon architecture (given that it's allowed by the model license), or even add a support for a new architecture yourself. We also improved Petals compatibility with some popular Llama-based models (e.g., models from NousResearch) in this release.

🐞 Bug fixes. This release also fixes inference of prefix-tuned models, which was broken in Petals 2.1.0.

What's Changed

Require transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/479
Fix requiring transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/480
Rewrite MemoryCache alloc_timeout logic by @justheuristic in https://github.com/bigscience-workshop/petals/pull/434
Refactor readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/482
Support macOS natively by @borzunov in https://github.com/bigscience-workshop/petals/pull/477
Remove no-op process in PrioritizedTaskPool by @borzunov in https://github.com/bigscience-workshop/petals/pull/484
Fix .generate(input_ids=...) by @borzunov in https://github.com/bigscience-workshop/petals/pull/485
Wait for DHT storing state OFFLINE on shutdown by @borzunov in https://github.com/bigscience-workshop/petals/pull/486
Fix race condition in MemoryCache by @borzunov in https://github.com/bigscience-workshop/petals/pull/487
Replace dots in repo names when building DHT prefixes by @borzunov in https://github.com/bigscience-workshop/petals/pull/489
Create model index in DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/491
Force use_cache=True by @borzunov in https://github.com/bigscience-workshop/petals/pull/496
Force use_cache=True in config only by @borzunov in https://github.com/bigscience-workshop/petals/pull/497
Add Falcon support by @borzunov in https://github.com/bigscience-workshop/petals/pull/499
Fix prompt tuning after #464 by @borzunov in https://github.com/bigscience-workshop/petals/pull/501
Optimize the Falcon block for inference by @mryab in https://github.com/bigscience-workshop/petals/pull/500

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.1.0...v2.2.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-09-07发行的版本