v1.14.0
版本发布时间: 2022-12-24 10:49:19
NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)
Highlights
NeMo ASR
- Hybrid CTC + Transducer loss ASR #5364
- Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
- ASR Adapters hyper parameter search scripts #5159
- RNNT {ONNX, TorchScript} x GPU export infer #5248
- Exportable MelSpectrogram (TorchScript) #5512
- Audio To Audio Dataset Processor #5196
- Multi Channel Audio Transcription #5479
- Silence Augmentation #5476
NeMo Megatron
- Support for the Mixture of Experts for T5
- Fix PTL model size output for GPT-3 and BERT
- BERT with Tensor Parallelism & Pipeline Parallel Support
NeMo Core
- Hydra Multirun core support + NeMo HP optim in YAML #5159
NeMo Models
Detailed Changelogs
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:22.11
ASR
Changelog
- [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
- Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
- Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
- Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
- Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
- bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
- Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
- Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
- Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
- Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
- [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
- Add Silence Augmentation by @fayejf :: PR: #5476
- add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
- add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
- [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
- Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
- Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
- Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
- Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
- Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
- Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
- [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639
TTS
Changelog
- [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
- [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
- [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
- [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
- [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
- [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
- Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
- [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
- [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
- [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
- [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
- [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
- [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
- [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
- [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
- [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
- [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
- JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
- [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
- TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
- [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
- [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
- [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
- [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
- [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
- [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666
NLP / NMT
Changelog
- Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
- Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
- Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
- support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
- [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
- Bug fix/gpt by @shanmugamr1992 :: PR: #5493
- prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
- Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
- NLP docs fixes by @vsl9 :: PR: #5528
- Switch order of args in optimizer_step override by @ericharper :: PR: #5549
- Upgrade to 22.11 by @ericharper :: PR: #5550
- Merge r1.13.0 main by @ericharper :: PR: #5570
- some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
- Remove cell output from tutorial by @ericharper :: PR: #5689
Text Normalization / Inverse Text Normalization
Changelog
- [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
- [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
Export
Changelog
- Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
- Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
- Fixes for Conformer-xl export by @borisfom :: PR: #5309
- Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
- add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
General Improvements
Changelog
- bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
- Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
- Better patch hydra by @titu1994 :: PR: #5591
- [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
- Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
- Update perturb.py by @stevehuang52 :: PR: #5231
- remove CV requirements. by @XuesongYang :: PR: #5233
- checks for accepted adapter type at module level by @arendu :: PR: #5194
- fix hypotheses return by @nithinraok :: PR: #5253
- Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
- update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
- Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
- created by @bmwshop :: PR: #5268
- Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
- O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
- Upperbound PTL by @titu1994 :: PR: #5302
- Update Interface(s) phonetic entry by @blisc :: PR: #5212
- add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
- Add italian model checkpoints by @Kipok :: PR: #5315
- Text Memmap Parsing Improvements by @michalivne :: PR: #5265
- Update librosa signature in HF processing script by @titu1994 :: PR: #5321
- Force wav file format for audio_filepath by @titu1994 :: PR: #5323
- Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
- [DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
- Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
- typo fix by @arendu :: PR: #5328
- add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
- Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
- Fixing de-autocast by @borisfom :: PR: #5319
- [Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
- [DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
- Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
- removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
- Enable mlflow logger by @whrichd :: PR: #4893
- Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
- Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
- SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
- add squad by @arendu :: PR: #5407
- added python and c++ alignment code by @yzhang123 :: PR: #5346
- Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
- Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
- Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
- update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
- Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
- Create codeql.yml by @titu1994 :: PR: #5445
- Update codeql.yml by @titu1994 :: PR: #5449
- Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
- Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
- Add float32 type casting for get_samples function by @tango4j :: PR: #5399
- Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
- Add auto-labeler by @SeanNaren :: PR: #5498
- Add more glob patterns for labeler by @SeanNaren :: PR: #5504
- Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
- [BugFix] Removing
tokens from decoding timestamp by @tango4j :: PR: #5481 - Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
- Data parallel collect results by @michalivne :: PR: #5547
- Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
- Fixed Docker build by @borisfom :: PR: #5562
- Patch hydra launch by @titu1994 :: PR: #5589
- Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
- Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
- Fixed a missing import for gather_objects by @michalivne :: PR: #5622