MyGit

v2.0.0rc0

NVIDIA/NeMo

版本发布时间: 2024-06-06 13:46:45

NVIDIA/NeMo最新发布版本:v2.0.0rc0(2024-06-06 13:46:45)

Highlights

LLM and MM

Models

Performance

Export

Speech (ASR & TTS)

Models

Perf Improvements

Customization

Misc

Detailed Changelogs

ASR

Changelog
  • Enable using hybrid asr models in CTC Segmentation tool by @erastorgueva-nv :: PR: #8828
  • TDT confidence fix by @GNroy :: PR: #8982
  • Fix union type annotations for autodoc+mock-import rendering by @pzelasko :: PR: #8956
  • NeMo dev doc restructure by @yaoyu-33 :: PR: #8896
  • Improved random seed configuration for Lhotse dataloaders with docs by @pzelasko :: PR: #9001
  • Fix #8948, allow preprocessor to be stream captured to a cuda graph when doing per_feature normalization by @galv :: PR: #8964
  • [ASR] Support for transcription of multi-channel audio for AED models by @anteju :: PR: #9007
  • Add ASR latest news by @titu1994 :: PR: #9073
  • Fix docs errors and most warnings by @erastorgueva-nv :: PR: #9006
  • PyTorch CUDA allocator optimization for dynamic batch shape dataloading in ASR by @pzelasko :: PR: #9061
  • RNN-T and TDT inference: use CUDA graphs by default by @artbataev :: PR: #8972
  • Fix #8891 by supported GPU-side batched CTC Greedy Decoding by @galv :: PR: #9100
  • Update branch for notebooks and ci in release by @ericharper :: PR: #9189
  • Enable CUDA graphs by default only for transcription by @artbataev :: PR: #9196
  • rename paths2audiofiles to audio by @nithinraok :: PR: #9209
  • Fix ASR_Context_Biasing.ipynb contains FileNotFoundError by @andrusenkoau :: PR: #9233
  • Cherrypick: Support dataloader as input to audio for transcription (#9201) by @titu1994 :: PR: #9235
  • Update Online_Offline_Microphone_VAD_Demo.ipynb by @stevehuang52 :: PR: #9252
  • Dgalvez/fix greedy batch strategy name r2.0.0rc0 by @galv :: PR: #9243
  • Accept None as an argument to decoder_lengths in GreedyBatchedCTCInfer::forward by @galv :: PR: #9246
  • Fix loading github raw images on notebook by @nithinraok :: PR: #9282
  • typos by @nithinraok :: PR: #9314
  • Re-enable cuda graphs in training modes. by @galv :: PR: #9338
  • add large model stable training fix and contrastive loss update for variable seq by @nithinraok :: PR: #9259
  • Fix conv1d package in r2.0.0rc0 by @pablo-garay :: PR: #9369
  • Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer. (#9347) by @titu1994 :: PR: #9350
  • Make a backward compatibility for old MSDD configs in label models by @tango4j :: PR: #9377
  • Force diarizer to use CUDA if cuda is available and if device=None. by @tango4j :: PR: #9380

TTS

Changelog
  • [TTS] Add tutorial for training audio codecs by @rlangman :: PR: #8723
  • Update radtts.py by @blisc :: PR: #9097
  • [Nemo CICD] RADTTS test optional by @pablo-garay :: PR: #9112
  • Remove Radtts CI test by @blisc :: PR: #9144
  • Fix T5 G2P Input and Output Types by @blisc :: PR: #9224

LLM and MM

Changelog
  • Rachitg/dpa by @rachitgarg91 :: PR: #8911
  • Remove precision args in trainer due to PTL update by @yaoyu-33 :: PR: #8908
  • Huvu/mcore retro by @huvunvidia :: PR: #8861
  • fsdp tp > 1 bug fix by @dimapihtar :: PR: #8947
  • Fix memory leak at loss func by @minitu :: PR: #8868
  • change the condition for get qkv tensor from linear_qkv output in mcoremixin by @HuiyingLi :: PR: #8965
  • Add safety checks for 'data' key in MegatronGPTModel cfg by @HuiyingLi :: PR: #8991
  • [NeMo-UX] Adding MegatronParallel by @cuichenx :: PR: #8987
  • Skip top_p computations when set to 1.0 by @odelalleau :: PR: #8905
  • Gemma bug by @cuichenx :: PR: #8962
  • [NeMo-UX] Adding megatron strategy by @marcromeyn :: PR: #8995
  • Quantized checkpoint support in export and deploy modules by @janekl :: PR: #8859
  • add geglu to mlp swap by @JRD971000 :: PR: #8999
  • add timeout for new_group by @acphile :: PR: #8998
  • Zero-shot evaluation pipeline for mcore RETRO by @huvunvidia :: PR: #8941
  • Added fusion for squared relu by @sanandaraj5597 :: PR: #8963
  • Developer Documents for mcore RETRO by @huvunvidia :: PR: #9026
  • [NeMo-UX] Adding GPTModel & MockDataModule by @marcromeyn :: PR: #9011
  • Adding unit test for mcore RETRO model by @huvunvidia :: PR: #9022
  • docs and simplification of cmd args by @arendu :: PR: #8979
  • [NeMo-UX] Add checkpoint-io to MegatronStrategy by @marcromeyn :: PR: #9057
  • Enable Sequence Packing and Pipeline Parallel in NeVA by @yaoyu-33 :: PR: #8957
  • Mingyuanm/add back fp8 support to sd by @Victor49152 :: PR: #9070
  • unfused lora by @arendu :: PR: #9004
  • Handle case where num_query_groups is set to null for LoRA config setup by @vysarge :: PR: #9075
  • Alit/griffin by @JRD971000 :: PR: #9021
  • Implement DistributedCheckpointIO by @mikolajblaz :: PR: #9016
  • Video Neva Pretraining + Inference Implementation by @paul-gibbons :: PR: #9095
  • HF to .nemo for Mixtral-8x22B-instruct by @akoumpa :: PR: #9060
  • mcore ds updates by @dimapihtar :: PR: #8951
  • Alit/griffin perf by @JRD971000 :: PR: #9107
  • Add assert for max_steps to be positive in MegatronGPTSFTModel by @athitten :: PR: #9110
  • Extend sequence length padding for GPT SFT to account for context parallel by @vysarge :: PR: #8869
  • Update gpt dataset config parameter for mock by @thomasdhc :: PR: #9118
  • Add Mcore DistributedDataParallel and distributed optimizer into Nemo by @gdengk :: PR: #9034
  • Revert "Add assert for max_steps to be positive in MegatronGPTSFTMode… by @pablo-garay :: PR: #9128
  • scripts to convert HF lora to nemo by @arendu :: PR: #9102
  • Prevent duplicated checkpoints by @mikolajblaz :: PR: #9015
  • add TN/ITN link in speech tools list by @erastorgueva-nv :: PR: #9142
  • Cleanup deprecated files and temporary changes by @cuichenx :: PR: #9088
  • Use DP+CP groups as the FSDP sharding domain by @erhoo82 :: PR: #9145
  • CUDA memory profile by @erhoo82 :: PR: #9096
  • Fix missing func for T5 model by @gdengk :: PR: #9141
  • Add knob for load_directly_on_device by @mikolajblaz :: PR: #9125
  • Revert rope fusion defaults by @cuichenx :: PR: #9238
  • Update nemo.export module for quantized models by @janekl :: PR: #9250
  • Fix circular import for MM dataprep notebook by @cuichenx :: PR: #9287
  • neva media_type + text generation default fix by @paul-gibbons :: PR: #9257
  • fix lora and ptuning and isort/black by @oyilmaz-nvidia :: PR: #9290
  • add check if num layers is divisible by pp size by @dimapihtar :: PR: #9208
  • Fix P-tuning for Llama based models by @apanteleev :: PR: #9297
  • add deprecation warnings by @pablo-garay :: PR: #9266
  • move pooler under post_process by @dimapihtar :: PR: #9328
  • add deprecation note for nmt by @dimapihtar :: PR: #9342
  • Fix incorrect checkpoint removal logic (#9192) by @mikolajblaz :: PR: #9204
  • fix fp16 precision issue by @dimapihtar :: PR: #9376
  • Fix module.training for Neva in FusedAttn backward which causes nan by @yaoyu-33 :: PR: #8877

Export

Changelog
  • Updates for TRT-LLM 0.9 by @oyilmaz-nvidia :: PR: #8873
  • Mingyuanm/sdxl export by @Victor49152 :: PR: #8926
  • Avoid unpacking NeMo checkpoints before exporting to TRT-LLM by @apanteleev :: PR: #8866
  • Update gemma for trt-llm 0.9 by @oyilmaz-nvidia :: PR: #8974
  • TRT-LLM export P-tuning related fixes by @apanteleev :: PR: #8863

General Improvements

Changelog
  • Update package info by @ericharper :: PR: #8793
  • [Nemo CICD] Update mcore 4.13.24 by @pablo-garay :: PR: #8917
  • Akoumparouli/low mem mixtral ckpt converter by @akoumpa :: PR: #8895
  • Adding RETRO tests to Action Tests (cicd-main.yml) by @huvunvidia :: PR: #8942
  • Akoumparouli/fix sd train 2 by @akoumpa :: PR: #8883
  • Update te install for jenkins by @ericharper :: PR: #8954
  • [Nemo CICD] Add last job depending on others for blocking check by @pablo-garay :: PR: #8959
  • Minor quantization pipeline updates by @janekl :: PR: #8924
  • Fix External CLIP Converter by @yaoyu-33 :: PR: #8960
  • PP support in LoRA merge script by @cuichenx :: PR: #8934
  • Update PR template by @ericharper :: PR: #8978
  • Update Latest News by @shashank3959 :: PR: #8837
  • Fix incorrect link to latest news in README by @shashank3959 :: PR: #8985
  • Update dependency install for LLM and MM by @ericharper :: PR: #8990
  • Temporarily remove mcore dep by @ericharper :: PR: #9010
  • [Nemo CICD] further specialize runners for more parallelism by @pablo-garay :: PR: #9036
  • Update mm dataprep notebook based on feedback by @cuichenx :: PR: #9029
  • Fix import in lora merge script by @cuichenx :: PR: #9032
  • [Nemo CICD] Run when labeled:Run CICD by @pablo-garay :: PR: #9044
  • [Nemo CICD] Add tag/label for 1-gpu runner by @pablo-garay :: PR: #9046
  • [Nemo CICD] checkout v4 by @pablo-garay :: PR: #9048
  • [Nemo CICD] Remove temp test change by @pablo-garay :: PR: #9049
  • remove in-place addition for dreambooth train with text encoder by @Victor49152 :: PR: #8825
  • Mingyuanm/sdxl quantization notebook by @Victor49152 :: PR: #9042
  • [Nemo CICD] Trigger on comment issued by @pablo-garay :: PR: #9062
  • zarr ckpt to torch_dist ckpt converter by @dimapihtar :: PR: #8842
  • Restore PTQ tests for Llama2 (reopened) by @janekl :: PR: #9064
  • add clip H config by @JRD971000 :: PR: #9082
  • [NeMo-UX] Add mixed-precision plugin by @marcromeyn :: PR: #9065
  • Comment baichuan test and update pr template by @ericharper :: PR: #9085
  • Add safe extraction of nemo tar files by @athitten :: PR: #8976
  • Improved shard_id parsing in LazyNemoTarredIterator, enables AIS dataloading by @pzelasko :: PR: #9077
  • [NeMo-UX] Add mistral-7b model by @marcromeyn :: PR: #9066
  • Llama3 Conversion Script Update by @suiyoubi :: PR: #9089
  • dehardcode test string by @JimmyZhang12 :: PR: #8865
  • [Nemo CICD] Try trigger cicd run on comment by @pablo-garay :: PR: #9111
  • Lhotse dataloading: RIR augmentation and nemo/tarred input support for RIR and noise aug by @pzelasko :: PR: #9109
  • mixtral evaluation PR by @Slyne :: PR: #8989
  • [Nemo CICD] Revert: run GHA cicd on comment by @pablo-garay :: PR: #9119
  • [Nemo CICD] Comment out flaky test: running too long by @pablo-garay :: PR: #9123
  • [Nemo CICD] Add timeout to unit tests by @pablo-garay :: PR: #9132
  • [Nemo CICD] Indicate optional test in name (prefix) by @pablo-garay :: PR: #9139
  • video neva null image+video folder path fix by @paul-gibbons :: PR: #9116
  • [NeMo-UX] Add data module by @cuichenx :: PR: #9133
  • NeMo Inference Requirements by @oyilmaz-nvidia :: PR: #9093
  • Remove debug print by @maanug-nv :: PR: #9074
  • Remove legacy CI by @pablo-garay :: PR: #9149
  • Update support for push_to_hf_hub() by @titu1994 :: PR: #9159
  • [Nemo CICD] comment out flaky PTQ tests by @pablo-garay :: PR: #9160
  • Update branch by @ericharper :: PR: #9211
  • dist adam transpose fix by @dimapihtar :: PR: #9239
  • [Nemo CICD] Increase time limit for Speech_Checkpoints_tests (#9186) by @pablo-garay :: PR: #9247
  • Pin transformers by @ericharper :: PR: #9261
  • Fix typo in HF tutorial by @titu1994 :: PR: #9302

相关地址:原始地址 下载(tar) 下载(zip)

查看:2024-06-06发行的版本