February-2024
版本发布时间: 2024-02-21 01:01:21
unslothai/unsloth最新发布版本:September-2024(2024-09-24 05:32:53)
Update Unsloth on local machines with no dependency updates with pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
2x Faster Inference
Unsloth supports natively 2x faster inference. All QLoRA, LoRA and non LoRA inference paths are 2x faster. This requires no change of code or any new dependencies.
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
Chat Templates
Assuming your dataset is a list of list of dictionaries like the below:
[
[{'from': 'human', 'value': 'Hi there!'},
{'from': 'gpt', 'value': 'Hi how can I help?'},
{'from': 'human', 'value': 'What is 2+2?'}],
[{'from': 'human', 'value': 'What's your name?'},
{'from': 'gpt', 'value': 'I'm Daniel!'},
{'from': 'human', 'value': 'Ok! Nice!'},
{'from': 'gpt', 'value': 'What can I do for you?'},
{'from': 'human', 'value': 'Oh nothing :)'},],
]
You can use our get_chat_template
to format it. Select chat_template
to be any of zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
, and use mapping
to map the dictionary values from
, value
etc. map_eos_token
allows you to map <|im_end|>
to EOS without any training.
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
map_eos_token = True, # Maps <|im_end|> to </s> instead
)
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
from datasets import load_dataset
dataset = load_dataset("philschmid/guanaco-sharegpt-style", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
You can also make your own custom chat templates! For example our internal chat template we use is below. You must pass in a tuple
of (custom_template, eos_token)
where the eos_token
must be used inside the template.
unsloth_template = \
"{{ bos_token }}"\
"{{ 'You are a helpful assistant to the user\n' }}"\
"{% endif %}"\
"{% for message in messages %}"\
"{% if message['role'] == 'user' %}"\
"{{ '>>> User: ' + message['content'] + '\n' }}"\
"{% elif message['role'] == 'assistant' %}"\
"{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
"{% endif %}"\
"{% endfor %}"\
"{% if add_generation_prompt %}"\
"{{ '>>> Assistant: ' }}"\
"{% endif %}"
unsloth_eos_token = "eos_token"
tokenizer = get_chat_template(
tokenizer,
chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
map_eos_token = True, # Maps <|im_end|> to </s> instead
)
And many bug fixes!