v0.11.0
版本发布时间: 2024-09-19 16:46:19
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
We are excited to introduce the new v0.11.0 release, with many new features and post-training algorithms. The highlights are as follows:
New post-training methods
Generalized Knowledge Distillation
Generalized Knowledge Distillation (GKD) is a post-training method from Google DeepMind that extends standard knowledge distillation by allowing the student to generate outputs during training and receive online feedback from the teacher. It consistently outperforms SFT and in some cases enables the student model to match the performance of the teacher, but with far fewer parameters.
To train models with this method, check out the GKDTrainer
.
Exploratory Preference Optimization
Exploratory Preference Optimization is an online post-training method from researchers at Microsoft, MIT, and Wisconsin that extends DPO to incorporate online feedback from reward models or LLM judges. It is similar to online DPO, but has a slightly different theoretical basis concerning sample efficiency.
To train models with this method, check out the XPOTrainer
.
Nash Learning with Human Feedback
Nash Learning with Human Feedback is a novel post-training method from Google DeepMind that uses pairwise preference models which are conditioned on two inputs, instead of the single one used in reward models. These preference models are then used to train a policy that consistently produces responses that are preferred over those from competing policies, thus approximating a Nash equilibrium (i.e. a two player game where actions are responses and payoffs are given by the preference model).
To train models with this method, check out the NashMDTrainer
.
New trainer features
- Online DPO now supports training LoRA adapters with PEFT, which means you can dramatically reduce the amount of VRAM needed to train models with this method. By @qgallouedec in https://github.com/huggingface/trl/pull/2041
- The
OrpoTrainer
has better integration with PyTorchXLA for faster step time on TPUs ⚡ . By @wenxindongwork in https://github.com/huggingface/trl/pull/2001
Deprecations 🚨
- The
PPOTrainer
is marked for deprecated in favour ofPPOv2Trainer
to provide a consistent API across TRL's trainers. It will be removed inv0.12.0
. By @qgallouedec in https://github.com/huggingface/trl/pull/2016 - The
RichProgressCallback
has been removed from the example scripts as it caused a variety of problems with logging in distributed environments. You can still use it by adding it manually to the trainer callbacks. By @lewtun in https://github.com/huggingface/trl/pull/2053
Bugfixes and improvements
- Adds experimental Liger support to SFT script by @edbeeching in https://github.com/huggingface/trl/pull/1992
- move slow-tests CI to new cluster by @glegendre01 in https://github.com/huggingface/trl/pull/1996
- [Online-DPO] fixes to the training scripts and setup.py by @kashif in https://github.com/huggingface/trl/pull/1997
- [pre-commit] update pre-commit yaml by @kashif in https://github.com/huggingface/trl/pull/2002
- [Docs] Add Liger-Kernel usage to SFTTrainer page by @ryankert01 in https://github.com/huggingface/trl/pull/2007
- [ci] pin numpy to < 2 on windows by @kashif in https://github.com/huggingface/trl/pull/2009
- Remove
prompts
arg fromWinrateCallback
by @qgallouedec in https://github.com/huggingface/trl/pull/2010 - Allow
WinRateCallback
to be used without reference model by @qgallouedec in https://github.com/huggingface/trl/pull/2013 - Feat: Add support for APO-zero in KTOTrainer by @KarelDO in https://github.com/huggingface/trl/pull/1952
- Clean configs documentation by @qgallouedec in https://github.com/huggingface/trl/pull/1944
- Refactor reward modelling script to work with chat models by @lewtun in https://github.com/huggingface/trl/pull/2026
- correct formatting of star sign in kto_trainer.mdx by @mattany in https://github.com/huggingface/trl/pull/2031
- Remove unused functions in
core.py
by @northern-64bit in https://github.com/huggingface/trl/pull/2017 - Improves formatting of docstring + newlines by @northern-64bit in https://github.com/huggingface/trl/pull/2006
- Fix
packing
doc inSFTConfig
and fix error when neitherdataset_text_field
norformatting_func
is provided. by @qgallouedec in https://github.com/huggingface/trl/pull/2035 - fix: unpackaging error in Custom Mixture of Experts model when
aux_loss_enabled
is set to True. by @Jonathanjordan21 in https://github.com/huggingface/trl/pull/2039 - Drop canonical namespaces by @qgallouedec in https://github.com/huggingface/trl/pull/2048
- Change
non_eos_penalty
to be consistent acrossOnPolicy
trainers by @RylanSchaeffer in https://github.com/huggingface/trl/pull/2033 - Temporary pin the transformers hash in the CI by @qgallouedec in https://github.com/huggingface/trl/pull/2049
- [XPO] xpo trainer by @kashif in https://github.com/huggingface/trl/pull/1943
- Fix logits compuation in KTO trainer prediction step by @issamemari in https://github.com/huggingface/trl/pull/2050
- [Draft, don't merge] Fix failing windows by @LysandreJik in https://github.com/huggingface/trl/pull/2051
- Clean up DPO example by @lewtun in https://github.com/huggingface/trl/pull/2043
- Remove
debug
andsanity_check
args by @qgallouedec in https://github.com/huggingface/trl/pull/2055 - Gkd trainer by @kashif in https://github.com/huggingface/trl/pull/1814
- Documentation dataset format by @qgallouedec in https://github.com/huggingface/trl/pull/2020
- Add missing autodocs by @qgallouedec in https://github.com/huggingface/trl/pull/2056
- Mask loss in gkd when generating from the student by @gaetanlop in https://github.com/huggingface/trl/pull/2058
- ©️ Copyrights by @qgallouedec in https://github.com/huggingface/trl/pull/2063
- Support for
SFTTrainer.evaluate()
andSFTTrainer.predict()
with null train_dataset by @Sohaib9920 in https://github.com/huggingface/trl/pull/2004 - make cuda-only tests device-agnostic by @faaany in https://github.com/huggingface/trl/pull/2044
- Make
ConstantLengthDataset
(orpacking=True
) shuffle examples before they are packed by @muupan in https://github.com/huggingface/trl/pull/2037 - Standardise API for
WinRateCallback
andLogCompletionsCallback
by @lewtun in https://github.com/huggingface/trl/pull/2061 - Fix dataset in GKD script by @lewtun in https://github.com/huggingface/trl/pull/2067
- [online models] remove min_new_tokens=args.max_new_tokens by @kashif in https://github.com/huggingface/trl/pull/2069
- Standardising datasets for testing by @qgallouedec in https://github.com/huggingface/trl/pull/2065
- [KTO] learning rate recomentations for kto by @kashif in https://github.com/huggingface/trl/pull/2070
- Nash md by @kashif in https://github.com/huggingface/trl/pull/1853
- Use
transformers
utilities when possible by @qgallouedec in https://github.com/huggingface/trl/pull/2064 - Minor doc fixes and comments by @qgallouedec in https://github.com/huggingface/trl/pull/2073
- Added error check to RLOO, PPOv2, OnlineDPO that
ref_policy
andpolicy
have different identities by @RylanSchaeffer in https://github.com/huggingface/trl/pull/2057 -
processor(prompt, images=image)
toprocessor(images=image, text=prompt)
by @qgallouedec in https://github.com/huggingface/trl/pull/2076 - Use wrapped model for reference completions in
WinRateCallback
and set defaultfreq
toeval_steps
in LogCompletionsCallback` by @lewtun in https://github.com/huggingface/trl/pull/2074 - Conversational dataset support for Online DPO by @qgallouedec in https://github.com/huggingface/trl/pull/2075
- [WIP] Fix
logits/chosen
andlogits/rejected
metrics inkto_trainer
. by @PhilipMay in https://github.com/huggingface/trl/pull/2077 - Standardize dataset naming by @qgallouedec in https://github.com/huggingface/trl/pull/2081
- Fix deepspeed for
PPOv2Trainer
by @qgallouedec in https://github.com/huggingface/trl/pull/2080
New Contributors
- @AdnaneKhan made their first contribution in https://github.com/huggingface/trl/pull/1822
- @mkopecki made their first contribution in https://github.com/huggingface/trl/pull/1825
- @DZ9 made their first contribution in https://github.com/huggingface/trl/pull/1836
- @MAOJIASONG made their first contribution in https://github.com/huggingface/trl/pull/1840
- @davanstrien made their first contribution in https://github.com/huggingface/trl/pull/1845
- @eliebak made their first contribution in https://github.com/huggingface/trl/pull/1863
- @Rishav-hub made their first contribution in https://github.com/huggingface/trl/pull/1862
- @cemiu made their first contribution in https://github.com/huggingface/trl/pull/1738
- @SunMarc made their first contribution in https://github.com/huggingface/trl/pull/1919
- @karel-contextual made their first contribution in https://github.com/huggingface/trl/pull/1928
- @RylanSchaeffer made their first contribution in https://github.com/huggingface/trl/pull/1932
- @mina-parham made their first contribution in https://github.com/huggingface/trl/pull/1961
- @RhuiDih made their first contribution in https://github.com/huggingface/trl/pull/1887
- @SeungyounShin made their first contribution in https://github.com/huggingface/trl/pull/1969
- @kit1980 made their first contribution in https://github.com/huggingface/trl/pull/1933
- @akakakakakaa made their first contribution in https://github.com/huggingface/trl/pull/1987
- @hvaara made their first contribution in https://github.com/huggingface/trl/pull/1990
- @glegendre01 made their first contribution in https://github.com/huggingface/trl/pull/1996
- @ryankert01 made their first contribution in https://github.com/huggingface/trl/pull/2007
- @KarelDO made their first contribution in https://github.com/huggingface/trl/pull/1952
- @mattany made their first contribution in https://github.com/huggingface/trl/pull/2031
- @northern-64bit made their first contribution in https://github.com/huggingface/trl/pull/2017
- @Jonathanjordan21 made their first contribution in https://github.com/huggingface/trl/pull/2039
- @issamemari made their first contribution in https://github.com/huggingface/trl/pull/2050
- @wenxindongwork made their first contribution in https://github.com/huggingface/trl/pull/2001
- @Sohaib9920 made their first contribution in https://github.com/huggingface/trl/pull/2004
- @faaany made their first contribution in https://github.com/huggingface/trl/pull/2044
- @muupan made their first contribution in https://github.com/huggingface/trl/pull/2037
- @PhilipMay made their first contribution in https://github.com/huggingface/trl/pull/2077
Full Changelog: https://github.com/huggingface/trl/compare/v0.9.6...v0.11.0