v1.23.0

NVIDIA/NeMo

版本发布时间: 2024-02-28 14:18:16

NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)

Highlights

Models

Nvidia Starcoder 2 - 15B

Announcement - https://developer.nvidia.com/blog/unlock-your-llm-coding-potential-with-starcoder2/
AI Foundation Model Inference - https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/starcoder2-15b
https://huggingface.co/bigcode/starcoder2-15b

NeMo Canary

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/

https://huggingface.co/nvidia/canary-1b

NeMo LLM

Falcon
Code Llama
StarCoder
GPT perf improvements
Context parallelism
Mistral
Mixtral (without expert parallelism)
Mcore GPT Dataset integration

NeMo MM

CLIP
Stable Diffusion (supporting LoRA)
Imagen
ControlNet (for SD)
Instruct pix2pix (for SD)
LLAVA
NeVA
DreamFusion++
NSFW filtering

NeMo ASR

Lhotse Dataloading support #7880
Canary: Multi task multi lingual ASR #8242
LongForm Audio for Diarization #7737
Faster algorithm for RNN-T Greedy #7926
Cache-Aware streaming notebook #8296

NeMo TTS

NeMo Vision

Known Issues

ASR

RNNT WER calculation when fused batch size > 1 during validation / test step()

Previously, the RNNT metric was stateful while the CTC one was not (r1.22.0, r1.23.0)

Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via https://github.com/NVIDIA/NeMo/pull/8587 and will be fixed in the next release.

Workaround: Explicitly disable fused batch size during inference using the following command

from omegaconf import open_dict
model = ...
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
  decoding_cfg.fused_batch_size = -1
model.change_decoding_strategy(decoding_cfg)

Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the transcribe_speech.py or speech_to_text_eval.py in examples/asr.

Two failing unit tests due to a change in expected results, caused by lhotse version update.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:24.01.speech

Detailed Changelogs

ASR

Changelog

Update link to yaml file in ASR_with_Transducers.ipynb by @Faith-Nchifor :: PR: #8014
Use convert_hf_dataset_to_nemo by @karpnv :: PR: #8017
Update asr_language_modeling.rst: Add a missing word by @martin0258 :: PR: #8007
spelling mistake by @orena1 :: PR: #7903
update asr eval by @stevehuang52 :: PR: #8045
fix noise aug by @stevehuang52 :: PR: #8057
Various fixes for typos and urls by @titu1994 :: PR: #8066
[Fix] Increase length check tolerance to prevent test failing by @anteju :: PR: #8067
Add text metrics to asr eval by @stevehuang52 :: PR: #8087
fix device setting to allow using accelerator cpu by @orena1 :: PR: #8084
.ctm in data simulator annotator compliant with RT-09 specification by @popcornell :: PR: #8004
Fix AST eval by @stevehuang52 :: PR: #8112
fix: numba.*_num_threads resets torch num_threads #8141 by @itzsimpl :: PR: #8145
Update dependencies by @titu1994 :: PR: #8156
NeMo + Lhotse integration by @pzelasko :: PR: #7880
Speedup RNN-T greedy decoding by @artbataev :: PR: #7926
[docker] Install k2 before NeMo for faster image rebuilding by @pzelasko :: PR: #8204
[docs] Add --force_codec to tarred dataset creation examples by @pzelasko :: PR: #8227
Temporarily use the previous RNN-T decoding algorithm as default by @artbataev :: PR: #8226
Make TDT inference not require duration params by @hainan-xv :: PR: #8207
Cache Aware Streaming tutorial notebook by @erastorgueva-nv :: PR: #8296
fix path location and branch by @nithinraok :: PR: #8304
Attention encoder-decoder models for multiple speech-to-text tasks … by @titu1994 :: PR: #8324
Remove asr webapp by @titu1994 :: PR: #8347
remove target at model level in aed model config [ASR] by @krishnacpuvvada :: PR: #8351
Add change_vocabulary and save_tokenizers() support to Multitask ASR models by @titu1994 :: PR: #8357
Change default beam size by @titu1994 :: PR: #8371
adding jenkins test for speech_to_text_aed model by @krishnacpuvvada :: PR: #8368
Add Finetuning tutorial with HF Datasets by @nithinraok :: PR: #8356
wer fix by @tbartley94 :: PR: #8404
add ensemble decoding fix by @nithinraok :: PR: #8427
Update k2 by @artbataev :: PR: #8492

TTS

Changelog

[TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
[TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
Added VectorQuantizer base class by @anteju :: PR: #8011

LLMS

Changelog

Add interface to set NCCL options of each process group by @erhoo82 :: PR: #7923
Support O2 training of PEFT and SFT by @cuichenx :: PR: #7971
[NLP] Access scaler only in FP16 case by @janekl :: PR: #7916
[NLP] Minor improvements in Llama conversion script by @janekl :: PR: #7978
[NLP] Use helpers from utils_funcs.py in Llama conversion by @janekl :: PR: #7979
[NLP] Remove replace_sampler_ddp (deprecated in Trainer) by @janekl :: PR: #7981
Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 by @trias702 :: PR: #7920
Remove deprecated arguments from TE's TransformerLayer by @jbaczek :: PR: #7917
Add All Multimodal Source Code by @yaoyu-33 :: PR: #7791
First draft of mcore bert model in NeMo by @shanmugamr1992 :: PR: #7814
Support Falcon Variants (7B/40B/180B) in Mcore NeMo by @xuanzic :: PR: #7666
FSDP + Tensor Parallelism by @erhoo82 :: PR: #7897
Packed Sequence by @cuichenx :: PR: #7945
Adding method back that was removed accidentally by @ericharper :: PR: #8038
[NLP] ArtifactItem with init=True to make it debuggable by @janekl :: PR: #7980
SFT patch: (1) enable sequence parallelism and (2) enable profile by @erhoo82 :: PR: #7963
migration to PTL 2.0 for spellmapper model by @bene-ges :: PR: #7924
Change the megatron config lr scheduler default and fix to change partitions script by @shan18 :: PR: #8094
(1) Add SHARP interface to M-CORE, (2) use send/recv to send train loss to the first rank instead of b-cast by @erhoo82 :: PR: #7793
Reconfigure limit_val_batches only for int by @athitten :: PR: #8099
Fixing wrapper and moving it to base class by @shanmugamr1992 :: PR: #8055
fix gated_linear_unit bug by @Agoniii :: PR: #8042
Fix Adapter for MCore models by @cuichenx :: PR: #8124
add war fix for sync issues by @gshennvm :: PR: #8130
Improve PEFT UX by @cuichenx :: PR: #8131
Enhance flexibility by passing callbacks as method argument by @michal2409 :: PR: #8015
context parallelism by @xrennvidia :: PR: #7739
Make pipelined TP comm overlap available with mcore by @erhoo82 :: PR: #8005
remove deprecated scripts by @arendu :: PR: #8138
adding OnlineSampleMapping by @arendu :: PR: #8137
Add distopt support for FP8 params and BF16 optimizer state by @timmoon10 :: PR: #7909
Revert adding OnlineSampleMapping by @pablo-garay :: PR: #8164
Token count and sequence length logging for MegatronGPTSFTModel by @vysarge :: PR: #8136
Use latest apex internal API by @jbaczek :: PR: #8129
tune specific params in the base model by @arendu :: PR: #7745
Virtual pipeline parallel support for MegatronGPTSFTModel by @vysarge :: PR: #7964
removed deprecated peft model by @arendu :: PR: #8183
remove more deprecated files by @arendu :: PR: #8169
Pre-generate cu_seqlens argmin and max_seqlen to remove host-to-device sync by @erhoo82 :: PR: #8108
Add the interface to use SHARP to FSDP strategy by @erhoo82 :: PR: #8202
Multimodal required NLP base model changes by @yaoyu-33 :: PR: #8188
[NLP] Improve and unify loading state_dict for community models by @janekl :: PR: #7977
Rename Finetuning Scripts by @cuichenx :: PR: #8201
Final multimodal PR with our recent developments on MM side by @yaoyu-33 :: PR: #8127
Add include_text parameter to SFT dataloaders by @Kipok :: PR: #8198
Add random_seed argument to generate by @Kipok :: PR: #8162
Added support for neptune logger by @harishankar-gopalan :: PR: #8210
Pre-compute max_seqlen and cu_seqlens_argmin in all model-parallel cases by @erhoo82 :: PR: #8222
Use PackedSeqParams in accordance with changes in Megatron-LM by @cuichenx :: PR: #8205
Fix to peft & virtual pipeline parallel unsupported check by @vysarge :: PR: #8216
Fixed the tp overlap switch by @sanandaraj5597 :: PR: #8195
add knobs for rope/swiglu fusion by @lhb8125 :: PR: #8184
Added sample cpu_offloading switch to YAML by @sanandaraj5597 :: PR: #8148
Syncing random seed between ranks in generate by @Kipok :: PR: #8230
add first_val_step to mcore scheduler by @JimmyZhang12 :: PR: #8150
Correct padding for SFT input data to account for sequence parallel + TE's fp8 op dimension requirements by @vysarge :: PR: #8240
Mistral 7b conversion script by @akoumpa :: PR: #8052
switch to mcore dataset [with FIM support] by @dimapihtar :: PR: #8149
Mixtral to NeMo conversion script. by @akoumpa :: PR: #8155
fixes to accomendate mcore changes by @HuiyingLi :: PR: #8261
Allow MegatronPretrainingRandomSampler to do multi-epoch training by @trias702 :: PR: #8239
Add dist ckpt support for regular optimizers by @mikolajblaz :: PR: #7749
add deallocate pipeline output optimization by @JimmyZhang12 :: PR: #8279
Fix memory leak caused by context parallelism hanging references by omegaconf by @JimmyZhang12 :: PR: #8299
distributed fused adam + rampup bs support by @dimapihtar :: PR: #8302
Update PEFT Doc by @cuichenx :: PR: #8262
Converter script fixes for mixtral/mistral by @akoumpa :: PR: #8272
Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 by @erhoo82 :: PR: #8334
Enable megatron core loggers for GPT pretraining by @ashbhandare :: PR: #8354
mcore ds fix by @dimapihtar :: PR: #8283
release updates by @dimapihtar :: PR: #8378
Mcore customization doc by @HuiyingLi :: PR: #8298
updated link to pubmed by @nithinraok :: PR: #8402
mcore customization doc minor fix by @HuiyingLi :: PR: #8421
Fixing mcore bert for TP, PP and SP by @shanmugamr1992 :: PR: #8336
Add settings to suppress bf16 compile errors in CI on V100 by @athitten :: PR: #8481
MoE parameter passing by @akoumpa :: PR: #8255
Add fp8 support for SD/Update notebook paths by @Victor49152 :: PR: #8489

NeMo Tools

Changelog

SDE bugfix log by @Jorjeous :: PR: #8430

General Improvements

Changelog

Add news section to README by @ericharper :: PR: #7984
Fixing conversion script to work for code llama by @shanmugamr1992 :: PR: #7997
Fix crash when converting to mcore a model using rotary embeddings by @odelalleau :: PR: #7998
Added a procedure for Windows users, README by @Jorjeous :: PR: #7942
Update manifest.py to speedup loading tarred datasets by @stevehuang52 :: PR: #7900
[Fix] Fixed name of a test by @anteju :: PR: #7986
Fix lora merge script by @cuichenx :: PR: #8113
Support transcoding audio formats when saving tarred datasets (FLAC, OPUS) by @pzelasko :: PR: #8102
README edit to change Apple Silicon install instructions (to fix a break introduced by pytorch 2) by @stephenmcconnachie :: PR: #8122
Fixes NVIDIA/apex installation to not erroneously install the pkg by @terrykong :: PR: #8126
Graphviz fix by @GNroy :: PR: #7843
Update README.rst by @fayejf :: PR: #8154
Fix TP>1 issue for conversion script by @cuichenx :: PR: #8144
Support torch jit script by @artbataev :: PR: #8027
NeMo Multimodal Docs and Tests Initial PR by @yaoyu-33 :: PR: #8028
Remove left-over prints in NeMo+Lhotse code by @pzelasko :: PR: #8180
Upgrade to DLFW PyTorch 23.12 by @ericharper :: PR: #8163
Add Lhotse support for key in NeMo manifests by @pzelasko :: PR: #8197
Fix CPU Initialization and TP>1 for LoRA Merge Script by @cuichenx :: PR: #8199
Add support in Neural Typecheck to disable semantic checks by @titu1994 :: PR: #8212
Pin lhotse=1.19.2 in r1.23.0 by @pzelasko :: PR: #8303
Multimodal r1.23.0 bug fix by @yaoyu-33 :: PR: #8315
MCore dataset compatibility for tokenizers by @vysarge :: PR: #8390
Update NFA video download link by @erastorgueva-nv :: PR: #8406
Update MM Dataprep Tutorial by @cuichenx :: PR: #8410
Fix dreambooth data sampler issue by @yaoyu-33 :: PR: #8400
Fix a bug in CTM line processing function for multi-speaker data simulations by @tango4j :: PR: #8416
Akoumparouli/mistral bugfix by @akoumpa :: PR: #8353
pin to 0.5.0 by @ericharper :: PR: #8465
Update NeMo Multimodal Requirements by @yaoyu-33 :: PR: #8515
Fix link in multimodal dataprep tutorial by @cuichenx :: PR: #8517

相关地址：原始地址下载(tar) 下载(zip)

1、 asset-githubio-home-nemo_fw_llm_mm.png 368.52KB

2、 asset-githubio-home-nemo_fw_speech.png 363.58KB

3、 asset-githubio-home-sdxl_trt_fp16_1.png 1.26MB

4、 asset-githubio-home-sdxl_trt_fp16_2.png 1.36MB

5、 asset-githubio-home-sdxl_trt_fp16_3.png 1.25MB

6、 asset-githubio-home-sdxl_trt_int8_1.png 1.32MB

7、 asset-githubio-home-sdxl_trt_int8_2.png 1.29MB

8、 asset-githubio-home-sdxl_trt_int8_3.png 1.3MB

9、 hybrid_asr_tts_model.webp 45.75KB

10、 spectral_codec_architecture_fullband.png 183.19KB

11、 spectral_codec_architecture_multiband.png 204.79KB

查看：2024-02-28发行的版本