v0.9.3
版本发布时间: 2024-06-06 00:08:05
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:
- RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
- PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
- Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84
- New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
- New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)
What's Changed
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1568
- fix add_special_tokens issue for data with template by @edixiong in https://github.com/huggingface/trl/pull/1509
- [DPO] add 'bco_pair' loss_type by @seanexp in https://github.com/huggingface/trl/pull/1524
- [DPO] DPOConfig class by @kashif in https://github.com/huggingface/trl/pull/1554
- [SFT] add SFT Trainer Config dataclass by @kashif in https://github.com/huggingface/trl/pull/1530
- FIX: Fix CI on transformers main by @younesbelkada in https://github.com/huggingface/trl/pull/1576
- [
SFTTrainer
] Add warning in SFTTrainer when dataset already processed by @younesbelkada in https://github.com/huggingface/trl/pull/1577 - Fix typo detoxifying doc by @qgallouedec in https://github.com/huggingface/trl/pull/1594
- Core: removed unexisting
SftArgumentParser
by @younesbelkada in https://github.com/huggingface/trl/pull/1602 - [
KTOTrainer
] add BCO (reward shift and underlying distribution matching) by @seanexp in https://github.com/huggingface/trl/pull/1599 - [CLI] Use auto device map for model load by @lewtun in https://github.com/huggingface/trl/pull/1596
- Removing
tests/
from package data by @jamesbraza in https://github.com/huggingface/trl/pull/1607 - Docs: Fix build main documentation by @younesbelkada in https://github.com/huggingface/trl/pull/1604
- support loss function for Self-play Preference Optimization by @winglian in https://github.com/huggingface/trl/pull/1612
- Update HH dataset on helpful only subset by @vwxyzjn in https://github.com/huggingface/trl/pull/1613
- corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in https://github.com/huggingface/trl/pull/1615
- Fix ZeRO-3 generation context manager by @lewtun in https://github.com/huggingface/trl/pull/1617
- fixed adding bos and eos token unconditionally by @jasonyux in https://github.com/huggingface/trl/pull/1591
- visualize rm prediction by @vwxyzjn in https://github.com/huggingface/trl/pull/1636
- [ORPO] Correct label mask for pad tokens by @IlyaGusev in https://github.com/huggingface/trl/pull/1625
- Update sft_llama2.py to work with the latest API by @xianbaoqian in https://github.com/huggingface/trl/pull/1637
- Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in https://github.com/huggingface/trl/pull/1641
- Pairwise Noise Contrastive Alignment by @winglian in https://github.com/huggingface/trl/pull/1632
- don't cast the trainable lora layers to half precision by @pacman100 in https://github.com/huggingface/trl/pull/1644
- PPO / Reinforce Trainers by @vwxyzjn in https://github.com/huggingface/trl/pull/1540
- Apply deprecated
evaluation_strategy
by @muellerzr in https://github.com/huggingface/trl/pull/1559 - FEAT: Add support for training collator in PPOTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1658
- Correct Documentation for cDPO Usage by @AliBakly in https://github.com/huggingface/trl/pull/1655
- Fix inheritance order in PPOv2Config by @Nicolinho in https://github.com/huggingface/trl/pull/1659
- [DPO] Add 'robust' loss_type by @Abilityguy in https://github.com/huggingface/trl/pull/1653
- 🤫 TR-DPO implementation by @syrn1k in https://github.com/huggingface/trl/pull/1593
- Do not upcast adapters when using FSDP+QLoRA by @pacman100 in https://github.com/huggingface/trl/pull/1654
- [Tests] update eval_strategy API by @kashif in https://github.com/huggingface/trl/pull/1662
- Fix ppov2 test case by @vwxyzjn in https://github.com/huggingface/trl/pull/1661
- FIX / PPO: Fix
enable_input_require_grads
issues with PPO models by @younesbelkada in https://github.com/huggingface/trl/pull/1664 - fix dataset load error by @sywangyi in https://github.com/huggingface/trl/pull/1670
- FIX / SFTTrainer: Fix SFTTrainer with
args=None
by @younesbelkada in https://github.com/huggingface/trl/pull/1678 - Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in https://github.com/huggingface/trl/pull/1588
- intial RPO loss by @kashif in https://github.com/huggingface/trl/pull/1686
- Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in https://github.com/huggingface/trl/pull/1690
- Skip packing validation by @alex-jw-brooks in https://github.com/huggingface/trl/pull/1673
- Fix typo in DPOTrainer's warnings by @qgallouedec in https://github.com/huggingface/trl/pull/1688
- Quick fix on GPT4-eval by @vwxyzjn in https://github.com/huggingface/trl/pull/1696
- Release 0.9.2 by @vwxyzjn in https://github.com/huggingface/trl/pull/1697
New Contributors
- @edixiong made their first contribution in https://github.com/huggingface/trl/pull/1509
- @seanexp made their first contribution in https://github.com/huggingface/trl/pull/1524
- @jamesbraza made their first contribution in https://github.com/huggingface/trl/pull/1607
- @winglian made their first contribution in https://github.com/huggingface/trl/pull/1612
- @angelahzyuan made their first contribution in https://github.com/huggingface/trl/pull/1615
- @jasonyux made their first contribution in https://github.com/huggingface/trl/pull/1591
- @IlyaGusev made their first contribution in https://github.com/huggingface/trl/pull/1625
- @xianbaoqian made their first contribution in https://github.com/huggingface/trl/pull/1637
- @bartoszzuk made their first contribution in https://github.com/huggingface/trl/pull/1641
- @muellerzr made their first contribution in https://github.com/huggingface/trl/pull/1559
- @AliBakly made their first contribution in https://github.com/huggingface/trl/pull/1655
- @Nicolinho made their first contribution in https://github.com/huggingface/trl/pull/1659
- @Abilityguy made their first contribution in https://github.com/huggingface/trl/pull/1653
- @syrn1k made their first contribution in https://github.com/huggingface/trl/pull/1593
- @alexisrozhkov made their first contribution in https://github.com/huggingface/trl/pull/1690
- @alex-jw-brooks made their first contribution in https://github.com/huggingface/trl/pull/1673
Full Changelog: https://github.com/huggingface/trl/compare/v0.8.6...v0.9.2