MyGit

v0.11.0

huggingface/trl

版本发布时间: 2024-09-19 16:46:19

huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)

We are excited to introduce the new v0.11.0 release, with many new features and post-training algorithms. The highlights are as follows:

New post-training methods

Generalized Knowledge Distillation

Screenshot 2024-09-19 at 10 01 02

Generalized Knowledge Distillation (GKD) is a post-training method from Google DeepMind that extends standard knowledge distillation by allowing the student to generate outputs during training and receive online feedback from the teacher. It consistently outperforms SFT and in some cases enables the student model to match the performance of the teacher, but with far fewer parameters.

To train models with this method, check out the GKDTrainer.

Exploratory Preference Optimization

Screenshot 2024-09-19 at 10 13 27

Exploratory Preference Optimization is an online post-training method from researchers at Microsoft, MIT, and Wisconsin that extends DPO to incorporate online feedback from reward models or LLM judges. It is similar to online DPO, but has a slightly different theoretical basis concerning sample efficiency.

To train models with this method, check out the XPOTrainer.

Nash Learning with Human Feedback

Screenshot 2024-09-19 at 10 32 04

Nash Learning with Human Feedback is a novel post-training method from Google DeepMind that uses pairwise preference models which are conditioned on two inputs, instead of the single one used in reward models. These preference models are then used to train a policy that consistently produces responses that are preferred over those from competing policies, thus approximating a Nash equilibrium (i.e. a two player game where actions are responses and payoffs are given by the preference model).

To train models with this method, check out the NashMDTrainer.

New trainer features

Deprecations 🚨

Bugfixes and improvements

New Contributors

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.6...v0.11.0

相关地址:原始地址 下载(tar) 下载(zip)

查看:2024-09-19发行的版本