v0.5.0
版本发布时间: 2023-02-23 07:50:34
CarperAI/trlx最新发布版本:v0.7.0(2023-06-24 06:21:52)
Highlights
- Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
- In-depth example showcasing
trlx
usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh - Improved ILQL modeling integration with Hugging Face
transformers
. Users can now work withAutoModelForCausalLMWithILQLHeads
objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.
What's Changed
- Add
wandb
group naming by @jon-tow in https://github.com/CarperAI/trlx/pull/188 - Update
reward_fn
signatures in examples by @jon-tow in https://github.com/CarperAI/trlx/pull/190 - Add tokenizer config by @reciprocated in https://github.com/CarperAI/trlx/pull/189
- Fix extraction of
mixed_precision
option for deepspeed by @reciprocated in https://github.com/CarperAI/trlx/pull/197 - Fix
summarize_rlhf
inference checkpoint paths by @jon-tow in https://github.com/CarperAI/trlx/pull/194 - Make the config loading consistent across all example scripts. by @shermansiu in https://github.com/CarperAI/trlx/pull/192
- Make
Trainer.save_pretrained
sub-directory optional by @jon-tow in https://github.com/CarperAI/trlx/pull/201 - Update Readme to include T5 models by @aaronrmm in https://github.com/CarperAI/trlx/pull/198
- Make
make_head
accept dtype parameter by @reciprocated in https://github.com/CarperAI/trlx/pull/213 - Enable training with Tensorboard tracking by @marcobellagente93 in https://github.com/CarperAI/trlx/pull/209
- Support nested updates in
merge
by @cat-state in https://github.com/CarperAI/trlx/pull/219 - Fix typo reward normalize summarize by @PhungVanDuy in https://github.com/CarperAI/trlx/pull/221
- Update stale comment from results table by @jon-tow in https://github.com/CarperAI/trlx/pull/222
- Fix undefined trackers property by @alan-cooney in https://github.com/CarperAI/trlx/pull/224
- Fix tokenizer missing form config.to_dict() by @alan-cooney in https://github.com/CarperAI/trlx/pull/228
- Make experiment tracking optional by @jon-tow in https://github.com/CarperAI/trlx/pull/226
- read tokenizer path from config correctly by @JustinAWei in https://github.com/CarperAI/trlx/pull/230
- Add devcontainer support by @alan-cooney in https://github.com/CarperAI/trlx/pull/196
- fix: change lora_a:float to lora_r:int by @aaronrmm in https://github.com/CarperAI/trlx/pull/235
- Bump
isort
to hotfix CI code quality workflow by @jon-tow in https://github.com/CarperAI/trlx/pull/237 - Fix optional tracking in
accelerator.log
by @jon-tow in https://github.com/CarperAI/trlx/pull/233 - Improve documentation/comments on the random walk example by @alan-cooney in https://github.com/CarperAI/trlx/pull/208
- Update link to "Learning to Summarize from Human Feedback" by @jon-tow in https://github.com/CarperAI/trlx/pull/241
- Fix deepspeed state saving under
save_best
condition by @reciprocated in https://github.com/CarperAI/trlx/pull/242 - added colab notebook by @smellslikeml in https://github.com/CarperAI/trlx/pull/244
- [style] Increase black's line length by @reciprocated in https://github.com/CarperAI/trlx/pull/250
- Add help string to get_advantages_and_returns by @pesvut in https://github.com/CarperAI/trlx/pull/225
- Filter out empty responses by @reciprocated in https://github.com/CarperAI/trlx/pull/265
- NeMo Integrate by @cat-state in https://github.com/CarperAI/trlx/pull/125
- Add multi-process logger utility for status monitoring by @jon-tow in https://github.com/CarperAI/trlx/pull/254
- Add
NeMo
support info toREADME
by @jon-tow in https://github.com/CarperAI/trlx/pull/275 - Fix distributed dataloaders & deduplicate eval by @reciprocated in https://github.com/CarperAI/trlx/pull/276
- Improve PPO readability by @alan-cooney in https://github.com/CarperAI/trlx/pull/210
- Add T5 to delta modifier map by @aaronrmm in https://github.com/CarperAI/trlx/pull/234
- [fix] Set deepspeed's fp16
auto_cast
to false by @reciprocated in https://github.com/CarperAI/trlx/pull/279 - Rename remaining
logprobs_from_logits
call by @jon-tow in https://github.com/CarperAI/trlx/pull/281 - [feat] Add Accelerate SFT Trainer by @reciprocated in https://github.com/CarperAI/trlx/pull/280
- Add Colab Notebook for Sentiment by @zswitten in https://github.com/CarperAI/trlx/pull/285
- Remove
pylance
installs from devcontainer by @jon-tow in https://github.com/CarperAI/trlx/pull/296 - Move notebooks to examples dir by @jon-tow in https://github.com/CarperAI/trlx/pull/294
- [fix] Summarize config discrepancy by @reciprocated in https://github.com/CarperAI/trlx/pull/293
- Make Git check optional by @cat-state in https://github.com/CarperAI/trlx/pull/299
- refactor: remove orchestrator abstraction from API by @jon-tow in https://github.com/CarperAI/trlx/pull/289
- Set
add_special_tokens=False
to not add EOS unexpectedly by @cat-state in https://github.com/CarperAI/trlx/pull/287 - [feat] Gather experience samples by @reciprocated in https://github.com/CarperAI/trlx/pull/305
- [fix] Make
gather_for_metrics
usage more strict by @reciprocated in https://github.com/CarperAI/trlx/pull/315 - Add helpful and harmless example by @reciprocated in https://github.com/CarperAI/trlx/pull/128
- Adopt
PreTrainedModelWrapper
for Hugging Face models by @jon-tow in https://github.com/CarperAI/trlx/pull/215
New Contributors
- @shermansiu made their first contribution in https://github.com/CarperAI/trlx/pull/192
- @aaronrmm made their first contribution in https://github.com/CarperAI/trlx/pull/198
- @marcobellagente93 made their first contribution in https://github.com/CarperAI/trlx/pull/209
- @alan-cooney made their first contribution in https://github.com/CarperAI/trlx/pull/224
- @JustinAWei made their first contribution in https://github.com/CarperAI/trlx/pull/230
- @smellslikeml made their first contribution in https://github.com/CarperAI/trlx/pull/244
- @pesvut made their first contribution in https://github.com/CarperAI/trlx/pull/225
- @zswitten made their first contribution in https://github.com/CarperAI/trlx/pull/285
Full Changelog: https://github.com/CarperAI/trlx/compare/v0.4...v0.5.0