v1.6.0

NVIDIA/NeMo

版本发布时间: 2022-01-29 12:53:25

NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)

ASR

Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
Move vocabs from asr to common by @Oktai15 :: PR: #3084
Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
Updates on ASR with diarization util files by @tango4j :: PR: #3359
Asr fr by @tbartley94 :: PR: #3404
Refactor ASR Examples Directory by @titu1994 :: PR: #3392
Asr patches by @titu1994 :: PR: #3443
Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487

TTS

MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
Minor Updates to TTS Finetuning by @blisc :: PR: #3455

NLP / NMT

NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
NMT checkpoint averaging by @michalivne :: PR: #3096
NMT validation examples with inputs by @michalivne :: PR: #3194
Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
NLP text augmentation by @michalivne :: PR: #3291
Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
NMT MIM mean variance fix by @michalivne :: PR: #3385
NMT Shared Embeddings Weights by @michalivne :: PR: #3340
Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
Byte-level Multilingual NMT by @aklife97 :: PR: #3368
BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
(1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259

Text Normalization / Inverse Text Normalization

Tn clean upsample by @yzhang123 :: PR: #3024
Tn add nn wfst and doc by @yzhang123 :: PR: #3135
Update english tn ckpt by @yzhang123 :: PR: #3143
WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
German TN wfst by @yzhang123 :: PR: #3174
Add ITN Vietnamese by @binh234 :: PR: #3217
WFST TN updates by @ekmb :: PR: #3235
Itn german refactor by @yzhang123 :: PR: #3262
Tn german deterministic by @yzhang123 :: PR: #3308
TN updates by @ekmb :: PR: #3285
Added double digits to EN ITN by @yzhang123 :: PR: #3321
TN_non_deterministic optimized by @ekmb :: PR: #3343
Missing init for TN German by @ekmb :: PR: #3355
Ru TN by @ekmb :: PR: #3390
Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440

NeMo Tools

CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
Updated NumPy SDE requirement by @vsl9 :: PR: #3442

Export

ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Documentation

Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
Tn add nn wfst and doc by @yzhang123 :: PR: #3135
Add apex into by @PeganovAnton :: PR: #3214
Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
Doc link fixes by @nithinraok :: PR: #3264
French ASR Doc updates by @tbartley94 :: PR: #3322
german asr doc page update by @yzhang123 :: PR: #3325
update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
Asr fr by @tbartley94 :: PR: #3404
Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
Update speaker diarization docs by @tango4j :: PR: #3419
NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
Add verification helper function and update docs by @nithinraok :: PR: #3514
Prompt tuning documentation by @vadam5 :: PR: #3541
French ASR Doc updates by @tbartley94 :: PR: #3322
German asr doc page update by @yzhang123 :: PR: #3325

Bugfixes

Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
Fix README by @ericharper :: PR: #3070
Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
Attribute is not working in . by @PeganovAnton :: PR: #3099
Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
Fixed two typos by @bene-ges :: PR: #3157
Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
Add apex into by @PeganovAnton :: PR: #3214
Patch omegaconf for cfg by @fayejf :: PR: #3224
Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
Doc link fixes by @nithinraok :: PR: #3264
Escape chars fix by @ekmb :: PR: #3253
Fix asr output - eval mode by @nithinraok :: PR: #3274
Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
Tn en money fix by @yzhang123 :: PR: #3290
Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
Fix bucketing list bug. by @VahidooX :: PR: #3315
Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
Fix german and vietnames grammar by @yzhang123 :: PR: #3331
Fix readme to show cmd by @yzhang123 :: PR: #3345
Fix speaker label models training convergence by @nithinraok :: PR: #3354
Tqdm get datasets by @bmwshop :: PR: #3358
Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
TalkNet Fix by @stasbel :: PR: #3092
Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
NMT MIM mean variance fix by @michalivne :: PR: #3385
Fix bug for missing variable by @MaximumEntropy :: PR: #3437
Asr patches by @titu1994 :: PR: #3443
Prompt tuning loss mask fix by @vadam5 :: PR: #3438
BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
Fix hysterisis loading by @MaximumEntropy :: PR: #3460
Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
file name fix - Segmentation tutorial by @ekmb :: PR: #3474
Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
Fix description by @PeganovAnton :: PR: #3482
typo fix in diarization notebooks by @nithinraok :: PR: #3480
Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486
Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491
Fix link to NGC page for ASR by @titu1994 :: PR: #3512
vad typo fix by @fayejf :: PR: #3490
fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525
Fixed section typo by @vadam5 :: PR: #3522
Fixed duplicate cell bug by @vadam5 :: PR: #3518
Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
Fix nmt resume by @ericharper :: PR: #3539
TN bug fix by @ekmb :: PR: #3538
Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552
Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
Fix asr output - eval mode by @nithinraok :: PR: #3274
Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
Fix link to NGC page for ASR by @titu1994 :: PR: #3512
Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
Fix description by @PeganovAnton :: PR: #3482
Fix nmt resume by @ericharper :: PR: #3539
TN bug fix by @ekmb :: PR: #3538
Fix german and vietnames grammar by @yzhang123 :: PR: #3331
Tn en money fix by @yzhang123 :: PR: #3290

Improvements:

Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034
Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056
(1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055
Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061
Add new by @PeganovAnton :: PR: #2963
Add PUBLICATIONS.md by @titu1994 :: PR: #3051
Hg cache by @yzhang123 :: PR: #3080
Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090
Add logging to LS script by @titu1994 :: PR: #3141
Modify speaker input by @nithinraok :: PR: #3100
Typo correction in README.rst by @satpalsr :: PR: #3103
Self-supervised pre-training for speech models by @sam1373 :: PR: #3139
Add AISHELL 2 processing script by @titu1994 :: PR: #3195
Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192
Reduce number of log files for large runs by @blisc :: PR: #3191
Add support to modify nemo cache directory by @titu1994 :: PR: #3208
Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207
Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234
Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256
Support for gecko tool by @nithinraok :: PR: #3266
Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222
Initial refactor by @borisfom :: PR: #3272
Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305
Replacing outdated exports scripts by @borisfom :: PR: #3311
Batch implementation by @dimapihtar :: PR: #3276
Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296
Add titanet by @nithinraok :: PR: #3333
update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346
Prompt tuning by @vadam5 :: PR: #3309
Remove wordninja by @ekmb :: PR: #3363
Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362
Moved shebangs to the first line by @davidalami :: PR: #3361
Added new method for logprobs computation by @dimapihtar :: PR: #3329
Update speaker collate functions by @nithinraok :: PR: #3381
Cache_hf by @ekmb :: PR: #3406
Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424
Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422
Remove apex by @ekmb :: PR: #3428
Vad infer refactor by @fayejf :: PR: #3394
Update LJSpeech preprocessing by @Oktai15 :: PR: #3423
Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425
TimingCallback default buffer_size=1 by @michalivne :: PR: #3439
Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429
Refactor data preprocessing script by @yzhang123 :: PR: #3444
Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470
Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466
Add Apex import guard by @ericharper :: PR: #3467
Adding missing init files by @yzhang123 :: PR: #3505
Typos by @ekmb :: PR: #3504
Update titanet conf by @nithinraok :: PR: #3507
Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510
Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520
Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521
WFST tutorial update by @tbartley94 :: PR: #3531
Update nvidia container check by @ericharper :: PR: #3535
Remove extra instance during restore by @ericharper :: PR: #3551
Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477

相关地址：原始地址下载(tar) 下载(zip)

查看：2022-01-29发行的版本