v0.7.9
版本发布时间: 2024-01-09 20:06:13
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
v0.7.9: Patch release for DPO & SFTTrainer
This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM
What's Changed
- Release: v0.7.8 by @younesbelkada in https://github.com/huggingface/trl/pull/1200
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1201
- Fix instruction token masking by @mgerstgrasser in https://github.com/huggingface/trl/pull/1185
- Fix reported KL in PPO trainer by @mgerstgrasser in https://github.com/huggingface/trl/pull/1180
- [
DPOTrainer
] Fix peft + DPO + bf16 if one usesgenerate_during_eval
or pre-computed logits by @younesbelkada in https://github.com/huggingface/trl/pull/1203 - Revert "Address issue #1122" by @younesbelkada in https://github.com/huggingface/trl/pull/1205
- Release: v0.7.9 by @younesbelkada in https://github.com/huggingface/trl/pull/1206
Full Changelog: https://github.com/huggingface/trl/compare/v0.7.8...v0.7.9