v2.1.0

版本发布时间: 2023-08-25 00:42:00

bigscience-workshop/petals最新发布版本:v2.2.0(2023-09-07 01:29:56)

Highlights

🔌 Compatibility with 🤗 Transformers generation utils. Petals models now directly use 🤗 Transformers .generate() implementation instead of custom generation code. This means that you can use a variety of generation methods and constraints implemented in 🤗 Transformers (e.g., repetition_penalty, beam search, etc.) and expect an exact match between Petals and a model running locally.

Most common methods are compatible with reusing inference sessions, so that you can run .generate() multiple times without reprocessing the dialogue history from scratch:

with model.inference_session(max_length=100):
    outputs1 = model.generate(user_prompt1, repetition_penalty=1.2)
    outputs2 = model.generate(user_prompt2, repetition_penalty=1.2)

⚡ Faster loading of Stable Beluga 2. We repacked Stable Beluga 2, the most popular model at the moment, to increase its loading speed and minimize RAM and disk space requirements. The repacked version can be loaded from the petals-team/StableBeluga2 repository and is fully compatible with clients and servers using the standard repository (stabilityai/StableBeluga2).

Now, clients need to download only 1.05 GB of data to run Stable Beluga 2 (instead of ~20 GB needed before) and require only 4 GB of RAM (instead of ~20 GB required before). Servers need to download and store 2x less data and load the model from disk significantly faster. If you're switching from the old repository, don't forget to remove the old cache in the~/.cache/petals/models--stabilityai--StableBeluga2 directory to save disk space.

⏱️ More responsive inference. In older versions, servers could become unresponsive for a few seconds while processing large prefixes (thousands of tokens) on inference. This release allows to perform small inference requests (a few tokens) in the middle of processing a large request, thus avoiding freezes during token-by-token inference caused by someone processing a large prefix.

🔒 Minor improvements. This release adds support for loading weights in the safetensors format on servers and adds the blocked_servers client option to avoid a given set of servers:

from petals import AutoDistributedModelForCausalLM

blocked_servers = ["12D3KooWA6g...", "12D3KooWGyD..."]  # Full peer IDs from https://health.petals.dev
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, blocked_servers=blocked_servers)

🐞 Bug fixes. This release also includes a variety of bug fixes allowing to speed up the chatbot app and fine-tuning, better bypass recently disconnect servers, improve rebalancing algorithm and usability of benchmarks, fix throughput measurements and installation on ARM CPUs.

We also fixed Petals compatibility with the latest releases of 🤗 Transformers, Accelerate, and PEFT libraries.

Breaking changes

📖 Default inference sessions. If you run .generate() or forward passes inside an .inference_session() context, they now use the opened session by default. These snippets are now equivalent:

# Using default session
with model.inference_session(max_length=100):
    output_ids = model.generate(input_ids, max_new_tokens=3)

# Explicitly specifying a session
with model.inference_session(max_length=100) as sess:
    output_ids = model.generate(input_ids, max_new_tokens=3, session=sess)

Earlier, the 1st snippet was creating a new session, which confused most people and lead to bugs.

➡️ Renaming. We renamed SequenceManagerConfig to petals.ClientConfig and petals.dht_utils to petals.utils.dht. The old names now lead to DeprecationWarnings and will be removed in Petals 2.2.0+.

What's Changed

Fix stale link by @bot66 in https://github.com/bigscience-workshop/petals/pull/418
Add Discord badge and more Discord links to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/422
Add connect_timeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/423
Add Stable Beluga 2 to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/424
Penalize servers that use relays during rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/428
Fix petals.utils.ping for servers with client-mode DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/430
Fix typo and make blocks message more informative by @vadi2 in https://github.com/bigscience-workshop/petals/pull/437
Update Discord links from channels to forums by @borzunov in https://github.com/bigscience-workshop/petals/pull/440
Remove distracting links from readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/441
Remove deprecated comment in fine-tuning notebook by @borzunov in https://github.com/bigscience-workshop/petals/pull/443
Use bitsandbytes 0.41.1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/442
[Refactor] extract block forward, backward and inference into a separate file by @justheuristic in https://github.com/bigscience-workshop/petals/pull/435
Override float32 in config to bfloat16 by @borzunov in https://github.com/bigscience-workshop/petals/pull/431
Prefer longer servers for fine-tuning, exclude unreachable by @borzunov in https://github.com/bigscience-workshop/petals/pull/448
Force using --new_swarm instead of empty --initial_peers by @borzunov in https://github.com/bigscience-workshop/petals/pull/451
Test Llama, rebalancing, throughput eval, and all CLI scripts by @borzunov in https://github.com/bigscience-workshop/petals/pull/452
benchmarks: Aggregate speed among workers, set default dtype torch32 by @borzunov in https://github.com/bigscience-workshop/petals/pull/454
Use torch.cuda.synchronize for compute throughput by @justheuristic in https://github.com/bigscience-workshop/petals/pull/456
Prioritize short inference, unmerge pools for long inference by @borzunov in https://github.com/bigscience-workshop/petals/pull/458
Bump version to 2.0.1.post2 by @borzunov in https://github.com/bigscience-workshop/petals/pull/459
Add blocked_servers argument by @borzunov in https://github.com/bigscience-workshop/petals/pull/462
Add customizable input tensors by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/445
Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht by @borzunov in https://github.com/bigscience-workshop/petals/pull/463
Make client compatible with transformers' GenerationMixin by @borzunov in https://github.com/bigscience-workshop/petals/pull/464
Temporarily require peft<0.5.0, transformers<4.32.0 by @justheuristic in https://github.com/bigscience-workshop/petals/pull/470
Support transformers 4.32.x by @justheuristic in https://github.com/bigscience-workshop/petals/pull/471
Change transformers version assert by @justheuristic in https://github.com/bigscience-workshop/petals/pull/472
Support loading weights from Safetensors on server by @borzunov in https://github.com/bigscience-workshop/petals/pull/473
Update peft to 0.5.0 version by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/475
Hide excess key message by @borzunov in https://github.com/bigscience-workshop/petals/pull/476
Bump version to 2.1.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/474
Don't install cpufeature on non-x86_64 machines by @borzunov in https://github.com/bigscience-workshop/petals/pull/478

New Contributors

@bot66 made their first contribution in https://github.com/bigscience-workshop/petals/pull/418

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.0.1...v2.1.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-08-25发行的版本