v0.7.5
版本发布时间: 2023-12-22 21:09:41
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
IPO & KTO & cDPO loss, DPOTrainer
enhancements, automatic tags for xxxTrainer
Important enhancements for DPOTrainer
This release introduces many new features in TRL for DPOTrainer
:
- IPO-loss for a better generalization of DPO algorithm
- KTO & cDPO loss
- You can also pass pre-computed logits to
DPOTrainer
- [DPO] Refactor eval logging of dpo trainer by @mnoukhov in https://github.com/huggingface/trl/pull/954
- Fixes reward and text gathering in distributed training by @edbeeching in https://github.com/huggingface/trl/pull/850
- remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in https://github.com/huggingface/trl/pull/1045
- Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in https://github.com/huggingface/trl/pull/1047
- Fix DPOTrainer + PEFT 2 by @rdk31 in https://github.com/huggingface/trl/pull/1049
- [DPO] IPO Training loss by @kashif in https://github.com/huggingface/trl/pull/1022
- [DPO] cDPO loss by @kashif in https://github.com/huggingface/trl/pull/1035
- [DPO] use ref model logprobs if it exists in the data by @kashif in https://github.com/huggingface/trl/pull/885
- [DP0] save eval_dataset for subsequent calls by @kashif in https://github.com/huggingface/trl/pull/1125
- [DPO] rename kto loss by @kashif in https://github.com/huggingface/trl/pull/1127
- [DPO] add KTO loss by @kashif in https://github.com/huggingface/trl/pull/1075
Automatic xxxTrainer
tagging on the Hub
Now, trainers from TRL pushes automatically tags trl-sft
, trl-dpo
, trl-ddpo
when pushing models on the Hub
- [
xxxTrainer
] Add tags to all trainers in TRL by @younesbelkada in https://github.com/huggingface/trl/pull/1120
unsloth 🤝 TRL
We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer
- [
Docs
] Add unsloth optimizations in TRL's documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1119
What's Changed
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/970
- [
Tests
] Add non optional packages tests by @younesbelkada in https://github.com/huggingface/trl/pull/974 - [DOCS] Fix outdated references to
examples/
by @alvarobartt in https://github.com/huggingface/trl/pull/977 - Update README.md by @GeekDream-x in https://github.com/huggingface/trl/pull/994
- [DataCollatorForCompletionOnlyLM] Warn on identical
eos_token_id
andpad_token_id
by @MustSave in https://github.com/huggingface/trl/pull/988 - [
DataCollatorForCompletionOnlyLM
] Add more clarification / guidance in the casetokenizer.pad_token_id == tokenizer.eos_token_id
by @younesbelkada in https://github.com/huggingface/trl/pull/992 - make distributed true for multiple process by @allanj in https://github.com/huggingface/trl/pull/997
- Fixed wrong trigger for warning by @zabealbe in https://github.com/huggingface/trl/pull/971
- Update how_to_train.md by @halfrot in https://github.com/huggingface/trl/pull/1003
- Adds
requires_grad
to input for non-quantized peft models by @younesbelkada in https://github.com/huggingface/trl/pull/1006 - [Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in https://github.com/huggingface/trl/pull/982
- Remove duplicate data loading in rl_training.py by @viethoangtranduong in https://github.com/huggingface/trl/pull/1020
- [Document] Minor fixes of sft_trainer document by @mutichung in https://github.com/huggingface/trl/pull/1029
- Update utils.py by @ZihanWang314 in https://github.com/huggingface/trl/pull/1012
- spelling is hard by @grahamannett in https://github.com/huggingface/trl/pull/1043
- Fixing accelerator version function call. by @ParthaEth in https://github.com/huggingface/trl/pull/1056
- [SFT Trainer] precompute packed iterable into a dataset by @lvwerra in https://github.com/huggingface/trl/pull/979
- Update doc CI by @lewtun in https://github.com/huggingface/trl/pull/1060
- Improve PreTrainedModelWrapper._get_current_device by @billvsme in https://github.com/huggingface/trl/pull/1048
- Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in https://github.com/huggingface/trl/pull/1062
- [
core
] Fix failing tests on main by @younesbelkada in https://github.com/huggingface/trl/pull/1065 - [
SFTTrainer
] Fix Trainer when args is None by @younesbelkada in https://github.com/huggingface/trl/pull/1064 - enable multiple eval datasets by @peter-sk in https://github.com/huggingface/trl/pull/1052
- Add missing
loss_type
inValueError
message by @alvarobartt in https://github.com/huggingface/trl/pull/1067 - Add args to SFT example by @lewtun in https://github.com/huggingface/trl/pull/1079
- add local folder support as input for rl_training. by @sywangyi in https://github.com/huggingface/trl/pull/1078
- Make CI happy by @younesbelkada in https://github.com/huggingface/trl/pull/1080
- Removing
tyro
insft_llama2.py
by @vwxyzjn in https://github.com/huggingface/trl/pull/1081 - Log arg consistency by @tcapelle in https://github.com/huggingface/trl/pull/1084
- Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in https://github.com/huggingface/trl/pull/1092
- [Feature] Add Ascend NPU accelerator support by @statelesshz in https://github.com/huggingface/trl/pull/1096
-
peft_module_casting_to_bf16
util method,append_concat_token
flag, remove callbackPeftSavingCallback
by @pacman100 in https://github.com/huggingface/trl/pull/1110 - Make prepending of bos token configurable. by @pacman100 in https://github.com/huggingface/trl/pull/1114
- fix gradient checkpointing when using PEFT by @pacman100 in https://github.com/huggingface/trl/pull/1118
- Update
description
insetup.py
by @alvarobartt in https://github.com/huggingface/trl/pull/1101
New Contributors
- @alvarobartt made their first contribution in https://github.com/huggingface/trl/pull/977
- @GeekDream-x made their first contribution in https://github.com/huggingface/trl/pull/994
- @MustSave made their first contribution in https://github.com/huggingface/trl/pull/988
- @allanj made their first contribution in https://github.com/huggingface/trl/pull/997
- @zabealbe made their first contribution in https://github.com/huggingface/trl/pull/971
- @viethoangtranduong made their first contribution in https://github.com/huggingface/trl/pull/1020
- @mutichung made their first contribution in https://github.com/huggingface/trl/pull/1029
- @ZihanWang314 made their first contribution in https://github.com/huggingface/trl/pull/1012
- @grahamannett made their first contribution in https://github.com/huggingface/trl/pull/1043
- @ChanderG made their first contribution in https://github.com/huggingface/trl/pull/1045
- @rdk31 made their first contribution in https://github.com/huggingface/trl/pull/1049
- @ParthaEth made their first contribution in https://github.com/huggingface/trl/pull/1056
- @billvsme made their first contribution in https://github.com/huggingface/trl/pull/1048
- @albertauyeung made their first contribution in https://github.com/huggingface/trl/pull/1062
- @peter-sk made their first contribution in https://github.com/huggingface/trl/pull/1052
- @sywangyi made their first contribution in https://github.com/huggingface/trl/pull/1078
- @tcapelle made their first contribution in https://github.com/huggingface/trl/pull/1084
- @cm2435 made their first contribution in https://github.com/huggingface/trl/pull/1092
- @statelesshz made their first contribution in https://github.com/huggingface/trl/pull/1096
- @pacman100 made their first contribution in https://github.com/huggingface/trl/pull/1110
Full Changelog: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5