v1.14.0

NVIDIA/NeMo

版本发布时间: 2022-12-24 10:49:19

NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)

Highlights

NeMo ASR

Hybrid CTC + Transducer loss ASR #5364
Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
ASR Adapters hyper parameter search scripts #5159
RNNT {ONNX, TorchScript} x GPU export infer #5248
Exportable MelSpectrogram (TorchScript) #5512
Audio To Audio Dataset Processor #5196
Multi Channel Audio Transcription #5479
Silence Augmentation #5476

NeMo Megatron

Support for the Mixture of Experts for T5
Fix PTL model size output for GPT-3 and BERT
BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

TTS Zh Fastpitch HifiGan SFSpeech

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog

[Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
[ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
Add Silence Augmentation by @fayejf :: PR: #5476
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
[ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
[STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog

[TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
[TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
[TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
[TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
[TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
[TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
[TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
[TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
[TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
[TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
[TTS] Add Spanish model documentation by @rlangman :: PR: #5390
[TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
[TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
[TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
[TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
[TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
[TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
[TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
[TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
[TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
[TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
[TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
Bug fix/gpt by @shanmugamr1992 :: PR: #5493
prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
NLP docs fixes by @vsl9 :: PR: #5528
Switch order of args in optimizer_step override by @ericharper :: PR: #5549
Upgrade to 22.11 by @ericharper :: PR: #5550
Merge r1.13.0 main by @ericharper :: PR: #5570
some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
Remove cell output from tutorial by @ericharper :: PR: #5689

Text Normalization / Inverse Text Normalization

Changelog

[ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Fixes for Conformer-xl export by @borisfom :: PR: #5309
Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

General Improvements

Changelog

bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
Better patch hydra by @titu1994 :: PR: #5591
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
Update perturb.py by @stevehuang52 :: PR: #5231
remove CV requirements. by @XuesongYang :: PR: #5233
checks for accepted adapter type at module level by @arendu :: PR: #5194
fix hypotheses return by @nithinraok :: PR: #5253
Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
created by @bmwshop :: PR: #5268
Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
Upperbound PTL by @titu1994 :: PR: #5302
Update Interface(s) phonetic entry by @blisc :: PR: #5212
add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
Add italian model checkpoints by @Kipok :: PR: #5315
Text Memmap Parsing Improvements by @michalivne :: PR: #5265
Update librosa signature in HF processing script by @titu1994 :: PR: #5321
Force wav file format for audio_filepath by @titu1994 :: PR: #5323
Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
[DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
typo fix by @arendu :: PR: #5328
add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
Fixing de-autocast by @borisfom :: PR: #5319
[Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
[DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
Enable mlflow logger by @whrichd :: PR: #4893
Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
add squad by @arendu :: PR: #5407
added python and c++ alignment code by @yzhang123 :: PR: #5346
Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
Create codeql.yml by @titu1994 :: PR: #5445
Update codeql.yml by @titu1994 :: PR: #5449
Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
Add float32 type casting for get_samples function by @tango4j :: PR: #5399
Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
Add auto-labeler by @SeanNaren :: PR: #5498
Add more glob patterns for labeler by @SeanNaren :: PR: #5504
Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
[BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481
Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
Data parallel collect results by @michalivne :: PR: #5547
Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
Fixed Docker build by @borisfom :: PR: #5562
Patch hydra launch by @titu1994 :: PR: #5589
Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
Fixed a missing import for gather_objects by @michalivne :: PR: #5622

相关地址：原始地址下载(tar) 下载(zip)

查看：2022-12-24发行的版本