v0.8.4

版本发布时间: 2024-04-17 23:22:10

huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)

This patch release includes important fixes for the CLI and KTO & CPO trainers

What's Changed

set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/1529
[CPO] fix memory leak due to retained value by @kashif in https://github.com/huggingface/trl/pull/1531
VSFT hotfix - adds gen prompt to template and processor to hub by @edbeeching in https://github.com/huggingface/trl/pull/1532
save_model -> save_pretrained in ppo_trainer.mdx by @ejmejm in https://github.com/huggingface/trl/pull/1537
[KTO] support to load the adapter twice by @claralp in https://github.com/huggingface/trl/pull/1542
CLI: Set dataset_text_field to None to allow ChatML automatic template by @younesbelkada in https://github.com/huggingface/trl/pull/1545
FIX: Fix slow test by @younesbelkada in https://github.com/huggingface/trl/pull/1546
Fixed ref model not used in PPO generation by @ejmejm in https://github.com/huggingface/trl/pull/1534
Release: v0.8.4 by @younesbelkada in https://github.com/huggingface/trl/pull/1547

Full Changelog: https://github.com/huggingface/trl/compare/v0.8.3...v0.8.4