v0.8.2
版本发布时间: 2024-04-11 21:51:28
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes
This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer
, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
ORPO Trainer
- ORPO trainer by @kashif in https://github.com/huggingface/trl/pull/1435
- [ORPO] use log1p for loss by @kashif in https://github.com/huggingface/trl/pull/1491
CPO Trainer
- Add CPOTrainer by @fe1ixxu in https://github.com/huggingface/trl/pull/1382
- Add
use_cache=False
in{ORPO,CPO}Trainer.concatenated_forward
by @alvarobartt in https://github.com/huggingface/trl/pull/1478 - [ORPO] Update NLL loss to use
input_ids
instead by @alvarobartt in https://github.com/huggingface/trl/pull/1516
VLLMs support for SFTTrainer
You can now use SFTTrainer
to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
- Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in https://github.com/huggingface/trl/pull/1518
KTO Fixes
Many fixes were introduced for the KTOTrainer:
- Update KTO example to use better model and ChatML support by @lewtun in https://github.com/huggingface/trl/pull/1485
- [KTO] Use batching to speed up data processing by @lewtun in https://github.com/huggingface/trl/pull/1470
- Update KTO example with good dataset & chat format by @lewtun in https://github.com/huggingface/trl/pull/1481
- [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in https://github.com/huggingface/trl/pull/1499
- [KTO] fix metric logging by @claralp in https://github.com/huggingface/trl/pull/1514
10x PPO !
- Speed up PPO with ZeRO-3 by 10x 🔥 by @lewtun in https://github.com/huggingface/trl/pull/1483
Other fixes
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1463
- Use the standard dataset for DPO CLI by @vwxyzjn in https://github.com/huggingface/trl/pull/1456
- [peft] Update test_reward_trainer.py to fix tests by @kashif in https://github.com/huggingface/trl/pull/1471
- Fix hyperparameters in KTO example by @lewtun in https://github.com/huggingface/trl/pull/1474
- docs: add missing Trainer classes and sort alphabetically by @anakin87 in https://github.com/huggingface/trl/pull/1479
- hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in https://github.com/huggingface/trl/pull/1488
- Ignore chat files by @lewtun in https://github.com/huggingface/trl/pull/1486
- Add DPO link in README by @qgallouedec in https://github.com/huggingface/trl/pull/1502
- Fix typo in how_to_train.md by @ftorres16 in https://github.com/huggingface/trl/pull/1503
- Fix DPO Unsloth example in Docs by @arnavgarg1 in https://github.com/huggingface/trl/pull/1494
- Correct ppo_epochs usage by @muhammed-shihebi in https://github.com/huggingface/trl/pull/1480
- Fix
RichProgressCallback
by @eggry in https://github.com/huggingface/trl/pull/1496 - Change the device index to device:index by @yuanwu2017 in https://github.com/huggingface/trl/pull/1490
- FIX: use kwargs for RMTrainer by @younesbelkada in https://github.com/huggingface/trl/pull/1515
- Allow streaming (datasets.IterableDataset) by @BramVanroy in https://github.com/huggingface/trl/pull/1468
- Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in https://github.com/huggingface/trl/pull/1520
- [DOC] Add data description for sfttrainer doc by @BramVanroy in https://github.com/huggingface/trl/pull/1521
- Release: v0.8.2 by @younesbelkada in https://github.com/huggingface/trl/pull/1522
New Contributors
- @fe1ixxu made their first contribution in https://github.com/huggingface/trl/pull/1382
- @anakin87 made their first contribution in https://github.com/huggingface/trl/pull/1479
- @galtay made their first contribution in https://github.com/huggingface/trl/pull/1488
- @qgallouedec made their first contribution in https://github.com/huggingface/trl/pull/1502
- @ftorres16 made their first contribution in https://github.com/huggingface/trl/pull/1503
- @arnavgarg1 made their first contribution in https://github.com/huggingface/trl/pull/1494
- @muhammed-shihebi made their first contribution in https://github.com/huggingface/trl/pull/1480
- @eggry made their first contribution in https://github.com/huggingface/trl/pull/1496
- @claralp made their first contribution in https://github.com/huggingface/trl/pull/1514
Full Changelog: https://github.com/huggingface/trl/compare/v0.8.1...v0.8.2