v1.6.0
版本发布时间: 2022-01-29 12:53:25
NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)
ASR
- Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
- Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
- Move vocabs from asr to common by @Oktai15 :: PR: #3084
- Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
- adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
- Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
- Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
- Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updates on ASR with diarization util files by @tango4j :: PR: #3359
- Asr fr by @tbartley94 :: PR: #3404
- Refactor ASR Examples Directory by @titu1994 :: PR: #3392
- Asr patches by @titu1994 :: PR: #3443
- Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487
TTS
- MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
- Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
- Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
- Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
- Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
- Minor Updates to TTS Finetuning by @blisc :: PR: #3455
NLP / NMT
- NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
- Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
- NMT checkpoint averaging by @michalivne :: PR: #3096
- NMT validation examples with inputs by @michalivne :: PR: #3194
- Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- NLP text augmentation by @michalivne :: PR: #3291
- Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
- Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
- Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- NMT Shared Embeddings Weights by @michalivne :: PR: #3340
- Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
- Byte-level Multilingual NMT by @aklife97 :: PR: #3368
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259
Text Normalization / Inverse Text Normalization
- Tn clean upsample by @yzhang123 :: PR: #3024
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Update english tn ckpt by @yzhang123 :: PR: #3143
- WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
- German TN wfst by @yzhang123 :: PR: #3174
- Add ITN Vietnamese by @binh234 :: PR: #3217
- WFST TN updates by @ekmb :: PR: #3235
- Itn german refactor by @yzhang123 :: PR: #3262
- Tn german deterministic by @yzhang123 :: PR: #3308
- TN updates by @ekmb :: PR: #3285
- Added double digits to EN ITN by @yzhang123 :: PR: #3321
- TN_non_deterministic optimized by @ekmb :: PR: #3343
- Missing init for TN German by @ekmb :: PR: #3355
- Ru TN by @ekmb :: PR: #3390
- Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440
NeMo Tools
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updated NumPy SDE requirement by @vsl9 :: PR: #3442
Export
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Documentation
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Add apex into by @PeganovAnton :: PR: #3214
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
- Doc link fixes by @nithinraok :: PR: #3264
- French ASR Doc updates by @tbartley94 :: PR: #3322
- german asr doc page update by @yzhang123 :: PR: #3325
- update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
- Asr fr by @tbartley94 :: PR: #3404
- Update copyright to 2022 by @ericharper :: PR: #3426
- Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
- Update speaker diarization docs by @tango4j :: PR: #3419
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- Add verification helper function and update docs by @nithinraok :: PR: #3514
- Prompt tuning documentation by @vadam5 :: PR: #3541
- French ASR Doc updates by @tbartley94 :: PR: #3322
- German asr doc page update by @yzhang123 :: PR: #3325
Bugfixes
- Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
- Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
- Fix README by @ericharper :: PR: #3070
- Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
- Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
- Attribute is not working in . by @PeganovAnton :: PR: #3099
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
- Fixed two typos by @bene-ges :: PR: #3157
- Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
- LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
- Add apex into by @PeganovAnton :: PR: #3214
- Patch omegaconf for cfg by @fayejf :: PR: #3224
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
- Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
- Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
- Doc link fixes by @nithinraok :: PR: #3264
- Escape chars fix by @ekmb :: PR: #3253
- Fix asr output - eval mode by @nithinraok :: PR: #3274
- Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
- Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- Tn en money fix by @yzhang123 :: PR: #3290
- Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
- Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
- Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
- Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
- Fix bucketing list bug. by @VahidooX :: PR: #3315
- Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
- Fix german and vietnames grammar by @yzhang123 :: PR: #3331
- Fix readme to show cmd by @yzhang123 :: PR: #3345
- Fix speaker label models training convergence by @nithinraok :: PR: #3354
- Tqdm get datasets by @bmwshop :: PR: #3358
- Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
- Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
- Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
- TalkNet Fix by @stasbel :: PR: #3092
- Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
- Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
- Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
- Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- Fix bug for missing variable by @MaximumEntropy :: PR: #3437
- Asr patches by @titu1994 :: PR: #3443
- Prompt tuning loss mask fix by @vadam5 :: PR: #3438
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- Fix hysterisis loading by @MaximumEntropy :: PR: #3460
- Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
- Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
- WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
- Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
- file name fix - Segmentation tutorial by @ekmb :: PR: #3474
- Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
- Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
- Fix description by @PeganovAnton :: PR: #3482
- typo fix in diarization notebooks by @nithinraok :: PR: #3480
- Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486
- Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491
- Fix link to NGC page for ASR by @titu1994 :: PR: #3512
- vad typo fix by @fayejf :: PR: #3490
- fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525
- Fixed section typo by @vadam5 :: PR: #3522
- Fixed duplicate cell bug by @vadam5 :: PR: #3518
- Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
- Fix nmt resume by @ericharper :: PR: #3539
- TN bug fix by @ekmb :: PR: #3538
- Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
- Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552
- Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
- Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
- Fix asr output - eval mode by @nithinraok :: PR: #3274
- Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
- Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
- Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
- Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
- Fix link to NGC page for ASR by @titu1994 :: PR: #3512
- Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
- Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
- Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
- Fix description by @PeganovAnton :: PR: #3482
- Fix nmt resume by @ericharper :: PR: #3539
- TN bug fix by @ekmb :: PR: #3538
- Fix german and vietnames grammar by @yzhang123 :: PR: #3331
- Tn en money fix by @yzhang123 :: PR: #3290
Improvements:
- Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034
- Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056
- (1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055
- Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061
- Add new by @PeganovAnton :: PR: #2963
- Add PUBLICATIONS.md by @titu1994 :: PR: #3051
- Hg cache by @yzhang123 :: PR: #3080
- Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090
- Add logging to LS script by @titu1994 :: PR: #3141
- Modify speaker input by @nithinraok :: PR: #3100
- Typo correction in README.rst by @satpalsr :: PR: #3103
- Self-supervised pre-training for speech models by @sam1373 :: PR: #3139
- Add AISHELL 2 processing script by @titu1994 :: PR: #3195
- Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192
- Reduce number of log files for large runs by @blisc :: PR: #3191
- Add support to modify nemo cache directory by @titu1994 :: PR: #3208
- Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207
- Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234
- Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256
- Support for gecko tool by @nithinraok :: PR: #3266
- Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222
- Initial refactor by @borisfom :: PR: #3272
- Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305
- Replacing outdated exports scripts by @borisfom :: PR: #3311
- Batch implementation by @dimapihtar :: PR: #3276
- Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296
- Add titanet by @nithinraok :: PR: #3333
- update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346
- Prompt tuning by @vadam5 :: PR: #3309
- Remove wordninja by @ekmb :: PR: #3363
- Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362
- Moved shebangs to the first line by @davidalami :: PR: #3361
- Added new method for logprobs computation by @dimapihtar :: PR: #3329
- Update speaker collate functions by @nithinraok :: PR: #3381
- Cache_hf by @ekmb :: PR: #3406
- Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424
- Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422
- Remove apex by @ekmb :: PR: #3428
- Vad infer refactor by @fayejf :: PR: #3394
- Update LJSpeech preprocessing by @Oktai15 :: PR: #3423
- Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425
- TimingCallback default buffer_size=1 by @michalivne :: PR: #3439
- Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429
- Refactor data preprocessing script by @yzhang123 :: PR: #3444
- Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470
- Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466
- Add Apex import guard by @ericharper :: PR: #3467
- Adding missing init files by @yzhang123 :: PR: #3505
- Typos by @ekmb :: PR: #3504
- Update titanet conf by @nithinraok :: PR: #3507
- Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510
- Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520
- Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521
- WFST tutorial update by @tbartley94 :: PR: #3531
- Update nvidia container check by @ericharper :: PR: #3535
- Remove extra instance during restore by @ericharper :: PR: #3551
- Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477