v2.2.0
版本发布时间: 2023-09-07 01:29:56
bigscience-workshop/petals最新发布版本:v2.2.0(2023-09-07 01:29:56)
Highlights
🦅 Falcon support. Petals now supports all models based on Falcon, including Falcon 180B released today. We improved the 🤗 Transformers FalconModel
implementation to be up to 40% faster on recent GPUs. Our chatbot app runs Falcon 180B-Chat at ~2 tokens/sec.
Falcon-40B is licensed under Apache 2.0, so you can load it by specifying tiiuae/falcon-40b
or tiiuae/falcon-40b-instruct
as the model name. Falcon-180B is licensed under a custom license, and it is not clear if we can provide a Python interface for inference and fine-tuning of this model. Right now, it is only available in the chatbot app, and we are waiting for further clarifications from TII on this issue.
🍏 Native macOS support. You can run Petals clients and servers on macOS natively - just install Homebrew and run these commands:
brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server petals-team/StableBeluga2
If your computer has Apple M1/M2 chip, the Petals server will use the integrated GPU automatically. We recommend to only host Llama-based models, since other supported architectures do not work efficiently on M1/M2 chips yet. We also recommend using Python 3.10+ on macOS (installed by Homebrew automatically).
🔌 Serving custom models. Custom models now automatically show up at https://health.petals.dev as "not officially supported" models. As a reminder, you are not limited to models available at https://health.petals.dev and can run a server hosting any model based on BLOOM, Llama, or Falcon architecture (given that it's allowed by the model license), or even add a support for a new architecture yourself. We also improved Petals compatibility with some popular Llama-based models (e.g., models from NousResearch) in this release.
🐞 Bug fixes. This release also fixes inference of prefix-tuned models, which was broken in Petals 2.1.0.
What's Changed
- Require transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/479
- Fix requiring transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/480
- Rewrite MemoryCache alloc_timeout logic by @justheuristic in https://github.com/bigscience-workshop/petals/pull/434
- Refactor readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/482
- Support macOS natively by @borzunov in https://github.com/bigscience-workshop/petals/pull/477
- Remove no-op process in PrioritizedTaskPool by @borzunov in https://github.com/bigscience-workshop/petals/pull/484
- Fix
.generate(input_ids=...)
by @borzunov in https://github.com/bigscience-workshop/petals/pull/485 - Wait for DHT storing state OFFLINE on shutdown by @borzunov in https://github.com/bigscience-workshop/petals/pull/486
- Fix race condition in MemoryCache by @borzunov in https://github.com/bigscience-workshop/petals/pull/487
- Replace dots in repo names when building DHT prefixes by @borzunov in https://github.com/bigscience-workshop/petals/pull/489
- Create model index in DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/491
- Force use_cache=True by @borzunov in https://github.com/bigscience-workshop/petals/pull/496
- Force use_cache=True in config only by @borzunov in https://github.com/bigscience-workshop/petals/pull/497
- Add Falcon support by @borzunov in https://github.com/bigscience-workshop/petals/pull/499
- Fix prompt tuning after #464 by @borzunov in https://github.com/bigscience-workshop/petals/pull/501
- Optimize the Falcon block for inference by @mryab in https://github.com/bigscience-workshop/petals/pull/500
Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.1.0...v2.2.0