v0.9.6
版本发布时间: 2024-07-08 21:51:10
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:
-
Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input
loss_type="simpo"
andcpo_alpha=0
in theCPOConfig
and use it with theCPOTrainer
. -
Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
-
Added Efficient Exact Optimization (EXO) by @haozheji
We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!
What's Changed
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1710
- Add a variant of CPO, SimPO by @fe1ixxu in https://github.com/huggingface/trl/pull/1703
- [RPO] fix nll loss by @kashif in https://github.com/huggingface/trl/pull/1705
- fix yaml parser for derived config classes by @mnoukhov in https://github.com/huggingface/trl/pull/1713
- Fix default padding_value in dpo_config.py by @mnoukhov in https://github.com/huggingface/trl/pull/1692
- feat(ci): add trufflehog secrets detection by @McPatate in https://github.com/huggingface/trl/pull/1721
- ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in https://github.com/huggingface/trl/pull/1724
- adds AOT by @imelnyk in https://github.com/huggingface/trl/pull/1701
- Workflow: Notify tests results on slack channel by @younesbelkada in https://github.com/huggingface/trl/pull/1744
- better trl parser with yaml config by @mnoukhov in https://github.com/huggingface/trl/pull/1739
- CI / core: Pin
numpy
to!=2.0.0
for CI and to users by @younesbelkada in https://github.com/huggingface/trl/pull/1747 -
TrlParser
: Add ignore extra args option by @younesbelkada in https://github.com/huggingface/trl/pull/1748 - small KTO fixes by @kawine in https://github.com/huggingface/trl/pull/1734
- CPO / DPO: Fix red CI by @younesbelkada in https://github.com/huggingface/trl/pull/1749
- prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in https://github.com/huggingface/trl/pull/1728
- CI /
KTOTrainer
: Remove old tests by @younesbelkada in https://github.com/huggingface/trl/pull/1750 - change the
process
function in the example of DPO by @AIR-hl in https://github.com/huggingface/trl/pull/1753 - Integrate f-divergence to DPO (Follow up) by @1485840691 in https://github.com/huggingface/trl/pull/1610
- Support for returning past_key_values from the model by @idanshen in https://github.com/huggingface/trl/pull/1742
- Fix masking of response tokens by @mertsayar8 in https://github.com/huggingface/trl/pull/1718
- Support num_train_epochs by @vwxyzjn in https://github.com/huggingface/trl/pull/1743
- Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in https://github.com/huggingface/trl/pull/1758
- New sentiment and descriptiveness dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1757
- Add CPO-SimPO method by @fe1ixxu in https://github.com/huggingface/trl/pull/1760
- Added Reward Backpropogation Support by @mihirp1998 in https://github.com/huggingface/trl/pull/1585
- MoE Models: option to add load balancing loss by @claralp in https://github.com/huggingface/trl/pull/1765
-
evaluation_strategy
toeval_strategy
by @qgallouedec in https://github.com/huggingface/trl/pull/1771 - add Efficient Exact Optimization (EXO) by @haozheji in https://github.com/huggingface/trl/pull/1735
- Remove the leading space in the tldr preference dataset by @vwxyzjn in https://github.com/huggingface/trl/pull/1773
- Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in https://github.com/huggingface/trl/pull/1774
- Visual DPO by @qgallouedec in https://github.com/huggingface/trl/pull/1647
- [DOCS] fix docs and cli example script by @kashif in https://github.com/huggingface/trl/pull/1780
- Fixed typo in SFT trainer docs by @detsutut in https://github.com/huggingface/trl/pull/1788
- [SFT] add model_init_kwargs to training_args by @kashif in https://github.com/huggingface/trl/pull/1787
- Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in https://github.com/huggingface/trl/pull/1794
- Clean examples by @qgallouedec in https://github.com/huggingface/trl/pull/1791
- Remove extra print in reward_trainer.py by @mnoukhov in https://github.com/huggingface/trl/pull/1799
- Fix
torch_dtype
handling in{DPO,SFT}Trainer
when provided via CLI by @alvarobartt in https://github.com/huggingface/trl/pull/1807 - Fix
TRL_USE_RICH
environment variable handling by @alvarobartt in https://github.com/huggingface/trl/pull/1808 - 0.9.6 release by @vwxyzjn in https://github.com/huggingface/trl/pull/1816
New Contributors
- @McPatate made their first contribution in https://github.com/huggingface/trl/pull/1721
- @jetlime made their first contribution in https://github.com/huggingface/trl/pull/1724
- @imelnyk made their first contribution in https://github.com/huggingface/trl/pull/1701
- @AIR-hl made their first contribution in https://github.com/huggingface/trl/pull/1753
- @1485840691 made their first contribution in https://github.com/huggingface/trl/pull/1610
- @idanshen made their first contribution in https://github.com/huggingface/trl/pull/1742
- @mertsayar8 made their first contribution in https://github.com/huggingface/trl/pull/1718
- @scottsuk0306 made their first contribution in https://github.com/huggingface/trl/pull/1758
- @mihirp1998 made their first contribution in https://github.com/huggingface/trl/pull/1585
- @haozheji made their first contribution in https://github.com/huggingface/trl/pull/1735
- @Mubin17 made their first contribution in https://github.com/huggingface/trl/pull/1774
- @detsutut made their first contribution in https://github.com/huggingface/trl/pull/1788
- @noahlt made their first contribution in https://github.com/huggingface/trl/pull/1794
Full Changelog: https://github.com/huggingface/trl/compare/v0.9.4...v0.9.6