v0.5.0
版本发布时间: 2024-08-27 10:00:22
allenai/OLMo最新发布版本:v0.5.0(2024-08-27 10:00:22)
What's new
- Fixed conversion to HuggingFace model for DDP-trained models.
- Added support for remote source and destination for HuggingFace model conversion.
Added 🎉
- Added support for document masking via flash-attn during training with
--data.generate_doc_lengths
. - Added config options for
model.norm_after
,model.scale_emb_init
, andauxiliary_loss_multiplier
(used with zloss). - Added scripts for running experiments on qk_norm, norm reordering, and zloss.
- Added
model.rope_theta
configuration option. - Added
model.embedding_layer_norm
configuration option for adding a LN to the embeddings. - Added
model.emb_init_std
configuration option to override the standard deviation used to initialize the embeddings. - Added downstream eval task for requests dumped from oe-eval tasks
- Added
CosLinearEnvelope
scheduler, which is a pointwise product of a cosine schedule and a linear decay. - Added ability to save outputs of submodules for debugging purposes.
- Version dolma flan change in named_data_mix.py
Changed ⚠️
- Changed default distributed training strategy from single-GPU to FSDP
- Fixed behavior of
effective_memmap_dtype
to prevent unrecognized dtypes to be parsed asuint16
.
Fixed ✅
- Fixed restarting a training run in later epochs so that we no longer need to set the flag
--epoch=INT
. - Swapped in correct flan data mix.
- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
- Fixed
OLMo.from_checkpoint()
so that it correctly loadsolmo_core
andtorch_new
style checkpoints. - Fixed
preserve_rng_state
being incorrectly set to False when doing gradient checkpointing with dropout
Commits
cee1a5df Merge pull request #710 from allenai/version-dolma-flan-change 213a6395 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv 4575d405 Fix Conversion Issues + add support for remote upload. (#694) 78d79a51 Merge pull request #709 from allenai/shanea/debugging-docs 91478898 Merge pull request #685 from allenai/ot-oe-eval-requests 6cdc4cc0 Merge pull request #698 from allenai/shanea/compare-model-state e5217cfa Merge pull request #705 from allenai/dave/checkpoint_style_naming f4b386e6 Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size 1e71ce34 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme 6c4d53fe Merge pull request #702 from chrisc36/main 0bc7f6c7 Merge pull request #690 from allenai/shanea/trace-model-outputs-2 4332c322 Merge pull request #691 from allenai/dave/cosine_linear_envelope 6587ddb9 Merge pull request #674 from allenai/dave/flan_data_mix 7d63fe09 Merge pull request #671 from allenai/s3_unshard_to_hf c322b9a3 Merge pull request #686 from allenai/fix-from-checkpoint c482df74 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm 3e307106 Merge pull request #629 from allenai/epwalsh/amberish 4e004602 Add support for document masking during training (#661) b45002e8 make epoch logging less confusing 1b7d2756 Fix restarts in later epochs (#670) 345edc6f Merge branch 'main' of https://github.com/allenai/LLM 66d2be71 Revert "Update Beaker image" 07572231 Merge pull request #649 from allenai/ModelLadder 90b3889b Merge pull request #660 from allenai/fix_convert_olmo_to_hf dfb7212f Merge pull request #616 from allenai/chameleon d627c94e Merge pull request #665 from allenai/ddp-ckpt-fix ab63296a Improving memmap type parser (#663) b55fb5f7 Merge pull request #662 from allenai/tiny-olmo-config-fix 56d1fe07 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3 26c2d536 Merge pull request #648 from allenai/shanea/default-fsdp-strategy 65f1fff6 Merge pull request #656 from jeqcho/patch-1 20b82f86 Merge pull request #653 from allenai/shanea/olmo-v0.4.0
1、 ai2_olmo-0.5.0-py3-none-any.whl 26.85MB
2、 ai2_olmo-0.5.0.tar.gz 26.49MB