v0.7.2
版本发布时间: 2023-10-12 21:32:05
huggingface/trl最新发布版本:v0.11.1(2024-09-25 00:13:05)
0.7.2: Flash Attention documentation and Minor bugfixes
In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer
How to use Flash Attention with SFTTrainer
:
- Update sft_trainer.mdx to highlight Flash Attention features by @younesbelkada in https://github.com/huggingface/trl/pull/807
What's Changed
- Release: v0.7.1 by @younesbelkada in https://github.com/huggingface/trl/pull/709
- set dev version by @younesbelkada in https://github.com/huggingface/trl/pull/710
- fix device issue by @backpropper in https://github.com/huggingface/trl/pull/681
- Update docs on gms8k by @vwxyzjn in https://github.com/huggingface/trl/pull/711
- [
Docs
] Fix sft mistakes by @younesbelkada in https://github.com/huggingface/trl/pull/717 - Fix: RuntimeError: 'weight' must be 2-D issue by @jp1924 in https://github.com/huggingface/trl/pull/687
- Add pyproject.toml by @mnoukhov in https://github.com/huggingface/trl/pull/690
- [
core
] Bump peft to 0.4.0 by @younesbelkada in https://github.com/huggingface/trl/pull/720 - Refactor RewardTrainer hyperparameters into dedicated dataclass by @lewtun in https://github.com/huggingface/trl/pull/726
- Fix DeepSpeed ZeRO-3 in PPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/730
- [
SFTTrainer
] Check correctly for condition by @younesbelkada in https://github.com/huggingface/trl/pull/668 - Add epsilon to score normalization by @zfang in https://github.com/huggingface/trl/pull/727
- Enable gradient checkpointing to be disabled for reward modelling by @lewtun in https://github.com/huggingface/trl/pull/725
- [DPO] fixed metrics typo by @kashif in https://github.com/huggingface/trl/pull/743
- Seq2Seq model support for DPO by @gaetanlop in https://github.com/huggingface/trl/pull/586
- [DPO] fix ref_model by @i4never in https://github.com/huggingface/trl/pull/745
- [
core
] Fix import ofrandn_tensor
by @younesbelkada in https://github.com/huggingface/trl/pull/751 - Add benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/752
- update to
prepare_model_for_kbit_training
by @mnoukhov in https://github.com/huggingface/trl/pull/728 - benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/755
- EOS token processing for multi-turn DPO by @natolambert in https://github.com/huggingface/trl/pull/741
- Extend DeepSpeed integration to ZeRO-{1,2,3} by @lewtun in https://github.com/huggingface/trl/pull/758
- Imrpove benchmark ci by @vwxyzjn in https://github.com/huggingface/trl/pull/760
- [PPOTrainer] - add comment of zero masking (from second query token) by @zuoxingdong in https://github.com/huggingface/trl/pull/763
- Refactor and benchmark by @vwxyzjn in https://github.com/huggingface/trl/pull/662
- Benchmark CI (actual) by @vwxyzjn in https://github.com/huggingface/trl/pull/754
- docs: add initial version of docs for
PPOTrainer
by @davidberenstein1957 in https://github.com/huggingface/trl/pull/665 - Support fork in benchmark CI by @vwxyzjn in https://github.com/huggingface/trl/pull/764
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/773
- Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/775
- Benchmark CI fix by @vwxyzjn in https://github.com/huggingface/trl/pull/776
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/777
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/778
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/779
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/780
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/781
- Update benchmark.yml by @vwxyzjn in https://github.com/huggingface/trl/pull/782
- Ensure
RewardConfig
is backwards compatible by @lewtun in https://github.com/huggingface/trl/pull/748 - Temp benchmark ci dir by @vwxyzjn in https://github.com/huggingface/trl/pull/765
- Changed the default value of the
log_with
argument by @filippobistaffa in https://github.com/huggingface/trl/pull/792 - Add default Optim to DPO example by @natolambert in https://github.com/huggingface/trl/pull/759
- Add margin to RM training by @jvhoffbauer in https://github.com/huggingface/trl/pull/719
- [
DPO
] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in https://github.com/huggingface/trl/pull/799 - Add deepspeed experiment by @vwxyzjn in https://github.com/huggingface/trl/pull/795
- [
Docs
] Clarify PEFT docs by @younesbelkada in https://github.com/huggingface/trl/pull/797 - Fix docs bug on sft_trainer.mdx by @younesbelkada in https://github.com/huggingface/trl/pull/808
- [
PPOTrainer
] Fixes ppo trainer generate nit by @younesbelkada in https://github.com/huggingface/trl/pull/798 - Allow passing the token_ids as instruction_template in DataCollatorForCompletionOnlyLM by @devxpy in https://github.com/huggingface/trl/pull/749
- init custom eval loop for further DPO evals by @natolambert in https://github.com/huggingface/trl/pull/766
- Add RMSProp back to DPO by @natolambert in https://github.com/huggingface/trl/pull/821
- [DPO] add option for compute_metrics in DPOTrainer by @kashif in https://github.com/huggingface/trl/pull/822
- Small fixes to the PPO trainer doc and script. by @namin in https://github.com/huggingface/trl/pull/811
- Unify sentiment documentation by @vwxyzjn in https://github.com/huggingface/trl/pull/803
- Fix DeepSpeed ZeRO-{1,2} for DPOTrainer by @lewtun in https://github.com/huggingface/trl/pull/825
- Set trust remote code to false by default by @lewtun in https://github.com/huggingface/trl/pull/833
- [MINOR:TYPOS] Update README.md by @cakiki in https://github.com/huggingface/trl/pull/829
- Clarify docstrings, help messages, assert messages in merge_peft_adapter.py by @larekrow in https://github.com/huggingface/trl/pull/838
- add DDPO to index by @lvwerra in https://github.com/huggingface/trl/pull/826
- Raise error in
create_reference_model()
when ZeRO-3 is enabled by @lewtun in https://github.com/huggingface/trl/pull/840 - Use uniform config by @vwxyzjn in https://github.com/huggingface/trl/pull/817
- Give
lewtun
power by @lvwerra in https://github.com/huggingface/trl/pull/856 - Standardise example scripts by @lewtun in https://github.com/huggingface/trl/pull/842
- Fix version check in import_utils.py by @adampauls in https://github.com/huggingface/trl/pull/853
- dont use get_peft_model if model is already peft by @abhishekkrthakur in https://github.com/huggingface/trl/pull/857
- [
core
] Fix import issues by @younesbelkada in https://github.com/huggingface/trl/pull/859 - Support both old and new diffusers import path by @osanseviero in https://github.com/huggingface/trl/pull/843
New Contributors
- @backpropper made their first contribution in https://github.com/huggingface/trl/pull/681
- @jp1924 made their first contribution in https://github.com/huggingface/trl/pull/687
- @i4never made their first contribution in https://github.com/huggingface/trl/pull/745
- @zuoxingdong made their first contribution in https://github.com/huggingface/trl/pull/763
- @davidberenstein1957 made their first contribution in https://github.com/huggingface/trl/pull/665
- @filippobistaffa made their first contribution in https://github.com/huggingface/trl/pull/792
- @devxpy made their first contribution in https://github.com/huggingface/trl/pull/749
- @namin made their first contribution in https://github.com/huggingface/trl/pull/811
- @cakiki made their first contribution in https://github.com/huggingface/trl/pull/829
- @larekrow made their first contribution in https://github.com/huggingface/trl/pull/838
- @adampauls made their first contribution in https://github.com/huggingface/trl/pull/853
- @abhishekkrthakur made their first contribution in https://github.com/huggingface/trl/pull/857
- @osanseviero made their first contribution in https://github.com/huggingface/trl/pull/843
Full Changelog: https://github.com/huggingface/trl/compare/v0.7.1...v0.7.2