February-Gemma-2024
版本发布时间: 2024-02-27 00:10:54
unslothai/unsloth最新发布版本:September-2024(2024-09-24 05:32:53)
You can now finetune Gemma 7b 2.43x faster than HF + Flash Attention 2 with 57.5% less VRAM use. When compared to vanilla HF, Unsloth is 2.53x faster and uses 70% less VRAM. Blog post: https://unsloth.ai/blog/gemma. On local machines, update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
On 1x A100 80GB GPU, Unsloth can fit 40K total tokens (8192 * bsz of 5), whilst FA2 can fit ~15K tokens and vanilla HF can fit 9K tokens.
Gemma 7b Colab Notebook free Tesla T4: https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing
Gemma 2b Colab Notebook free Tesla T4: https://colab.research.google.com/drive/15gGm7x_jTm017_Ic8e317tdIpDG53Mtu?usp=sharing
To use Gemma, simply use FastLanguageModel
:
# Load Llama model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-7b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)