MyGit
🚩收到GitHub仓库的更新通知

bigscience-workshop/petals

Fork: 462 Star: 8682 (更新于 2024-05-01 12:02:34)

license: MIT

Language: Python .

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

最后发布版本: v2.2.0 ( 2023-09-07 01:29:56)

官方网址 GitHub网址

✨免费申请网站SSL证书,支持多域名和泛域名,点击查看


Run large language models at home, BitTorrent-style.
Fine-tuning and inference up to 10x faster than offloading


Generate text with distributed Llama 2 (70B), Falcon (40B+), BLOOM (176B) (or their derivatives), and fine‑tune them for your own tasks — right from your desktop computer or Google Colab:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "petals-team/StableBeluga2"  # This one is fine-tuned Llama 2 (70B)

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# Run the model as if it were on your computer
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...

🚀  Try now in Colab

🔏 Privacy. Your data will be processed with the help of other people in the public swarm. Learn more about privacy here. For sensitive data, you can set up a private swarm among people you trust.

🦙 Want to run Llama 2? Request access to its weights at the ♾️ Meta AI website and 🤗 Model Hub, then run huggingface-cli login in the terminal before loading the model. Or just try it in our chatbot app.

💬 Any questions? Ping us in our Discord!

Connect your GPU and increase Petals capacity

Petals is a community-run system — we rely on people sharing their GPUs. You can check out available models and help serving one of them! As an example, here is how to host a part of Stable Beluga 2 on your GPU:

🐧 Linux + Anaconda. Run these commands for NVIDIA GPUs (or follow this for AMD):

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server petals-team/StableBeluga2

🪟 Windows + WSL. Follow this guide on our Wiki.

🐋 Docker. Run our Docker image for NVIDIA GPUs (or follow this for AMD):

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
    learningathome/petals:main \
    python -m petals.cli.run_server --port 31330 petals-team/StableBeluga2

🍏 macOS + Apple M1/M2 GPU. Install Homebrew, then run these commands:

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server petals-team/StableBeluga2

📚  Learn more (how to use multiple GPUs, start the server on boot, etc.)

💬 Any questions? Ping us in our Discord!

🦙 Want to host Llama 2? Request access to its weights at the ♾️ Meta AI website and 🤗 Model Hub, generate an 🔑 access token, then add --token YOUR_TOKEN_HERE to the python -m petals.cli.run_server command.

🔒 Security. Hosting a server does not allow others to run custom code on your computer. Learn more here.

🏆 Thank you! Once you load and host 10+ blocks, we can show your name or link on the swarm monitor as a way to say thanks. You can specify them with --public_name YOUR_NAME.

How does it work?

  • You load a small part of the model, then join a network of people serving the other parts. Single‑batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) — enough for chatbots and interactive apps.
  • You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and 🤗 Transformers.

📜  Read paper            📚  See FAQ

📚 Tutorials, examples, and more

Basic tutorials:

  • Getting started: tutorial
  • Prompt-tune Llama-65B for text semantic classification: tutorial
  • Prompt-tune BLOOM to create a personified chatbot: tutorial

Useful tools:

Advanced guides:

  • Launch a private swarm: guide
  • Run a custom model: guide

Benchmarks

Please see Section 3.3 of our paper.

🛠️ Contributing

Please see our FAQ on contributing.

📜 Citation

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative Inference and Fine-tuning of Large Models. arXiv preprint arXiv:2209.01188, 2022.

@article{borzunov2022petals,
  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},
  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Ryabinin, Max and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},
  journal = {arXiv preprint arXiv:2209.01188},
  year = {2022},
  url = {https://arxiv.org/abs/2209.01188}
}

This project is a part of the BigScience research workshop.

最近版本更新:(数据更新于 2024-05-16 16:48:28)

2023-09-07 01:29:56 v2.2.0

2023-08-25 00:42:00 v2.1.0

2023-07-23 22:54:09 v2.0.1

2023-07-20 02:29:48 v2.0.0.post1

2023-05-10 07:03:13 v1.1.5

2023-04-21 10:26:19 v1.1.4

2023-03-01 17:15:25 v1.1.3

2023-01-31 04:38:50 v1.1.2

2023-01-14 05:41:32 v1.1.1

2023-01-10 19:53:49 v1.1.0

主题(topics):

bloom, chatbot, deep-learning, distributed-systems, falcon, gpt, guanaco, language-models, large-language-models, llama, llama2, machine-learning, neural-networks, nlp, pipeline-parallelism, pretrained-models, pytorch, tensor-parallelism, transformer, volunteer-computing

bigscience-workshop/petals同语言 Python最近更新仓库

2024-05-17 15:17:13 xorbitsai/inference

2024-05-17 02:09:05 xtekky/gpt4free

2024-05-16 21:11:59 foundryzero/binder-trace

2024-05-16 17:45:58 harry0703/MoneyPrinterTurbo

2024-05-16 16:42:40 linruowuyin/Fhoe-Rail

2024-05-16 10:55:30 langchain-ai/langchain