v0.4.0
版本发布时间: 2024-07-12 05:52:25
allenai/OLMo最新发布版本:v0.5.0(2024-08-27 10:00:22)
What's new
Added 🎉
- Added clipping fix to
Optimizer
class to make it work with FSDPno_shard
and DDP. - Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
- Expose memmap dtype in data config
- Added support for DDP training.
- Added caching to disk of HF datasets used in downstream evals
- Added FLOPs logging
- Added configs for OLMo tiny set of models
- Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set toTrue
will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added configuration field
optimizer.selective_updates
, which defaults toFalse
, but when set toTrue
will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0. - Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added
olmo_data
, a package holding data files like tokenizers. - Added ability to load tokenizers from
olmo_data
package data.
Changed ⚠️
- Added original legacy unsharding implementation back, as the default. The new
shared memory implementation can be used by passing
use_legacy_shared_mem_impl
tounshard.py
. - Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
- Changed the behavior of the Lion optimizer to only record the update cosine similarity when
optimizer.record_update_metrics
isTrue
in order to be consistent with the API. - Added HF datasets into
olmo_data
, and changed downstream eval to load from the package.
Fixed ✅
- Changed from
ignored_index
toignore_index
forcross_entropy_loss
whenflash-attn>=2.5.8
. - Make
hf_olmo
supportAutoModelForCasualLM
and similar HF methods again.
Commits
d423c11a Merge pull request #652 from allenai/shanea/update-to-torch2.3 b10ab4b2 Merge pull request #651 from allenai/shanea/lumi-torch2.3-2 a101b31b Merge pull request #646 from allenai/shanea/hf-datasets-from-package 429a7525 Merge pull request #647 from allenai/shanea/fix-tokenizer-break bc60b8ae Add option to skip optim steps for 0 grad params (#636) cbc7c25b Merge pull request #645 from allenai/shanea/tokenizer-package-data 1b2658bf Add option to record step size metrics from AdamW (#605) a3e2ea7b multiple epoch fix a1f118aa Merge pull request #628 from allenai/olmo-tiny d7994c86 Fix Z-loss calculation (#634) a5539f42 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model d72a2626 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements 2417b117 Make olmo-core checkpointer more robust on weka (#624) ddc88471 Merge pull request #612 from allenai/ddp 41ed20a6 Merge pull request #623 from allenai/shanea/hf-save-to-disk-2 a33caa99 Merge pull request #604 from allenai/WandbDiff e5d63a37 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints e207df77 Officially add OLMo-core as a dependency (#615) 72159aec Merge pull request #614 from allenai/shanea/pass-include-instance-metadata c2cedbc3 Merge pull request #607 from allenai/rewrite-init 578234d8 Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2 de43ee8a Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config 26392798 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype 9e894081 Create sensible filenames 02a8a586 Merge pull request #603 from allenai/shanea/unshard-without-passing-type ae84d479 Merge pull request #602 from allenai/no_shard_ddp_clip 40210bb1 Merge pull request #599 from allenai/train-olmo-large 55c1e2f9 Merge pull request #601 from allenai/no_shard_ddp_clip 5789cfe3 Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices eafd154d Merge pull request #579 from MLgdg/main 652c7456 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7 8ec28097 Merge pull request #589 from allenai/shanea/update-main-readme-hf 6e714b89 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods 65d55755 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts 0bddfe00 Merge pull request #585 from allenai/shanea/add-hf-docs e6430a07 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard c29787a8 Merge pull request #569 from allenai/Muennighoff/fix-torchv 7a462c57 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg 4f917fb7 Merge pull request #575 from allenai/shanea/add-weka 5c721cc8 Fix GPU tests CI (#574) 467adcc9 Merge remote-tracking branch 'origin/train-olmo-large' 4b2d12ea Merge pull request #565 from allenai/readme ccc49fde Merge pull request #564 from allenai/shanea/add-new-hf-converter b17abd05 Merge pull request #512 from liaoleo/main 295d3096 Merge pull request #561 from allenai/shanea/delay-device-mesh-import 4e8746d2 Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl f38de956 Merge pull request #558 from allenai/shanea/release-v0.3.0 829f1d69 Merge pull request #520 from allenai/add-ce-loss-metric
1、 ai2_olmo-0.4.0-py3-none-any.whl 11.9MB
2、 ai2_olmo-0.4.0.tar.gz 11.54MB