v0.7.0
版本发布时间: 2023-06-24 06:21:52
CarperAI/trlx最新发布版本:v0.7.0(2023-06-24 06:21:52)
The v0.7.0
release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
🐠 NeMo PPO and SFT support
This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.
- NeMo PPO by @cat-state in https://github.com/CarperAI/trlx/pull/472
- Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in https://github.com/CarperAI/trlx/pull/353
🦆 PEFT Migration
trlx
now supports parameter-efficient tuning methods via the peft
library, which we hope will provide greater access to RLHF training in low-resource settings.
- peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in https://github.com/CarperAI/trlx/pull/486
Fixes and mores!
- Set pad_token for all tokenizers in tests by @cat-state in https://github.com/CarperAI/trlx/pull/414
- Convert tensors in the stats dict into scalars by @ZHAOTING in https://github.com/CarperAI/trlx/pull/417
- Add Translation Finetuning Example with T5 by @alexandremuzio in https://github.com/CarperAI/trlx/pull/392
- set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in https://github.com/CarperAI/trlx/pull/409
- [fix] add
position_ids
toLlamaModelBranch
by @jon-tow in https://github.com/CarperAI/trlx/pull/418 - fix(CI): use pinned deps for CI testing by @jon-tow in https://github.com/CarperAI/trlx/pull/423
- Minibatch impl by @Dahoas in https://github.com/CarperAI/trlx/pull/364
- [feat] Support tying metadata to each prompt by @maxreciprocate in https://github.com/CarperAI/trlx/pull/421
- feat(examples): revamp simulacra example by @maxreciprocate in https://github.com/CarperAI/trlx/pull/430
- [fix] update pairwise dataloader. by @Chen9154 in https://github.com/CarperAI/trlx/pull/395
- fix(sft_trainer):
total_steps
calculation when running distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/432 - fix(base_trainer): gather weights in
save_pretrained
under zero3 by @maxreciprocate in https://github.com/CarperAI/trlx/pull/429 - fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/435
- fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in https://github.com/CarperAI/trlx/pull/441
- Create Example training scripts to run in Stability cluster by @alexandremuzio in https://github.com/CarperAI/trlx/pull/419
- Upgrade official released Ray instead of an unstable one. by @jovany-wang in https://github.com/CarperAI/trlx/pull/455
- Pin transformers<=4.27.1 by @jovany-wang in https://github.com/CarperAI/trlx/pull/458
- fix(ppo_gpt): prevent position_ids being None by @li-plus in https://github.com/CarperAI/trlx/pull/451
- fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in https://github.com/CarperAI/trlx/pull/460
- Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in https://github.com/CarperAI/trlx/pull/420
- Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in https://github.com/CarperAI/trlx/pull/422
- docs(base_trainer): fill in missing
prepare_learning
method by @maxreciprocate in https://github.com/CarperAI/trlx/pull/449 - fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in https://github.com/CarperAI/trlx/pull/450
- fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in https://github.com/CarperAI/trlx/pull/444
- feat(requirements.txt): upgrade dependencies by @maxreciprocate in https://github.com/CarperAI/trlx/pull/465
- fix(offline_pipeline): force
drop_last
only for distributed by @maxreciprocate in https://github.com/CarperAI/trlx/pull/475 - hotfix(bnb): install
scipy
withbitsanbytes
to avoidModuleNotFoundError
by @jon-tow in https://github.com/CarperAI/trlx/pull/492 - fix type hint in PromptPipeline.init by @g-simmons in https://github.com/CarperAI/trlx/pull/496
- fix(modeling_ilql): single q-head indexing by @maxreciprocate in https://github.com/CarperAI/trlx/pull/471
- Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in https://github.com/CarperAI/trlx/pull/506
- Fix PPO log_ratio bug by @TobiasNorlund in https://github.com/CarperAI/trlx/pull/509
- fix(ppo_trainer): default gen kwargs by @maxreciprocate in https://github.com/CarperAI/trlx/pull/510
New Contributors
- @ZHAOTING made their first contribution in https://github.com/CarperAI/trlx/pull/417
- @cauyxy made their first contribution in https://github.com/CarperAI/trlx/pull/409
- @Chen9154 made their first contribution in https://github.com/CarperAI/trlx/pull/395
- @jovany-wang made their first contribution in https://github.com/CarperAI/trlx/pull/455
- @li-plus made their first contribution in https://github.com/CarperAI/trlx/pull/451
- @mymusise made their first contribution in https://github.com/CarperAI/trlx/pull/460
- @mikljohansson made their first contribution in https://github.com/CarperAI/trlx/pull/420
- @g-simmons made their first contribution in https://github.com/CarperAI/trlx/pull/496
- @iwiwi made their first contribution in https://github.com/CarperAI/trlx/pull/506
- @TobiasNorlund made their first contribution in https://github.com/CarperAI/trlx/pull/509
- @glerzing made their first contribution in https://github.com/CarperAI/trlx/pull/486
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.6.0...v0.7.0