v1.20.0
版本发布时间: 2023-08-05 03:50:15
NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)
Highlights
Models
- STT En Fast Conformer CTC XXLarge - 1.2 B param Fast Conformer CTC
- STT En Fast Conformer Transducer XXLarge - 1.2 B param Fast Conformer Transducer
- STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer English
- STT En Fast Conformer CTC XLarge - XLarge Fast Conformer CTC
- STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer Transducer
- STT En Fast Conformer CTC Large - Large Fast Conformer CTC
- STT En Fast Conformer Transducer Large - Large Fast Conformer Transducer
- STT It Fast Conformer Hybrid Large P&C - Large P&C Italian Fast Conformer
- STT Ua Fast Conformer Hybrid Large P&C - Large Ukranian Fast Conformer
NeMo ASR
- Graph-RNN-T #6168
- WildCard-RNN-T #6168
- Confidence Ensembles for ASR
- Token-and-Duration Transducer (TDT) #6536
- Spellchecking ASR #6179
- Numba FP16 RNNT Loss #6991
NeMo TTS
- TTS Adapter Customization
- TTS Dataloader Framework
NeMo Framework
- LoRA for T5 and mT5 #6612
- Flash Attention integration #6666
- Mosaic 7B compatibility
- Models with LongContext (32K) #6666, #6687, #6773
NeMo Tools
- Speech Data Explorer: Utterance level ASR model comparsion #6669
- Speech Data Processor: Spanish P&C
- NeMo Forced Aligner: Large sequence alignment + memory reduction #6695
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.06
Detailed Changelogs
ASR
Changelog
- [ASR] Adding ssl config for fast-conformer by @krishnacpuvvada :: PR: #6672
- Fix for interctc test random failure by @Kipok :: PR: #6644
- sharded manifests docs by @bmwshop :: PR: #6751
- [TTS] Implement new vocoder dataset by @rlangman :: PR: #6670
- TDT model pull request by @hainan-xv :: PR: #6536
- Spec aug fix by @tbartley94 :: PR: #6775
- Support large inputs to Conformer and Fast Conformer by @bmwshop :: PR: #6556
- sharded manifests updated docs by @bmwshop :: PR: #6833
- added fc-xl, xxl and titanet-s models by @nithinraok :: PR: #6832
- Multi-lookahead cache-aware streaming models by @VahidooX :: PR: #6711
- Update transcribe_utils.py by @stevehuang52 :: PR: #6865
- Fix k2 build topo helper by @artbataev :: PR: #6887
- Fix transcribe_utils.py for hybrid models in partial transcribe mode by @stevehuang52 :: PR: #6899
- Add hybrid model support to transcribe_speech_parallel.py by @stevehuang52 :: PR: #6906
- Update Frame-VAD doc by @stevehuang52 :: PR: #6902
- Make sure asr_model.change_attention_model is run if either cfg.model_path or cfg.pretrained_name is specified by @erastorgueva-nv :: PR: #6908
- Update fvad doc by @stevehuang52 :: PR: #6920
- Online Code Switching Dataset for ASR by @trias702 :: PR: #6579
- Fix AN4 dataset links by @artbataev :: PR: #6926
- Fix confidence ensembles RNNT logprobs selection logic for exclude_blank scenario by @KunalDhawan :: PR: #6937
- Adding cache-aware streaming ASR checkpoints. by @VahidooX :: PR: #6940
- Remove from metrics by @titu1994 :: PR: #6979
- Hybrid conformer export by @borisfom :: PR: #6983
- Cache handling without input tensors mutation by @borisfom :: PR: #6980
- Fixing an issue with confidence ensembles by @Kipok :: PR: #6987
- Add ASR with TTS Tutorial. Fix enhancer usage. by @artbataev :: PR: #6955
- fix install_beamsearch_decoders.sh by @karpnv :: PR: #7019
- Add support for Numba FP16 RNNT Loss (#6991) by @titu1994 :: PR: #7038
- Fix typo and branch in tutorial by @artbataev :: PR: #7048
- Refined export_config by @borisfom :: PR: #7053
- Fix documentation for Numba by @titu1994 :: PR: #7065
- Adding docs and models for multiple lookahead cache-aware ASR by @VahidooX :: PR: #7067
- Add updated fc ctc and rnnt xxl models by @nithinraok :: PR: #7128
- Update notebook branch by @ericharper :: PR: #7135
- Fixed main and merging this to r1.20 by @tango4j :: PR: #7127
- Fix default context size by @nithinraok :: PR: #7141
- Fix incorrect embedding grads with distopt BF16 grad reductions by @timmoon10 :: PR: #6958
TTS
Changelog
- [TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
- [TTS] Add script for text preprocessing by @rlangman :: PR: #6541
- [TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
- [TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
- [TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
- [TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
- [TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
- Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
- [TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
- [TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012
NLP / NMT
Changelog
- minor fix for missing chat attr by @arendu :: PR: #6671
- eval fix by @arendu :: PR: #6685
- VP Fixes for converter + Config management by @titu1994 :: PR: #6698
- lora notebook by @arendu :: PR: #6765
- peft eval directly from ckpt by @arendu :: PR: #6785
- GPT inference long context by @ekmb :: PR: #6687
- Fix validation with drop_last=False by @mikolajblaz :: PR: #6704
- fix spellmapper tutorial, change branch to main by @bene-ges :: PR: #6803
- text_generation_utils memory reduction if no logprob needed by @yzhang123 :: PR: #6773
- Add optional index mapping dir in mmap text datasets by @gheinrich :: PR: #6683
- Add inference kv cache support for transformer TE path by @yen-shi :: PR: #6627
- add reference to our paper by @bene-ges :: PR: #6821
- added changes to ramp up bs by @dimapihtar :: PR: #6799
- t5 lora tuning by @arendu :: PR: #6612
- Added rouge monitoring support for T5 by @jubick1337 :: PR: #6737
- GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention by @hsiehjackson :: PR: #6666
- Import Enum for chatbot component by @ericharper :: PR: #6877
- typo fix from #6666 by @arendu :: PR: #6882
- removed unnecessary print by @dimapihtar :: PR: #6884
- Fix destructor for delayed mmap dataset case by @mikolajblaz :: PR: #6703
- Make Gradio library optional by @yidong72 :: PR: #6904
- Fix fast-glu activation in change partitions by @hsiehjackson :: PR: #6909
- Documentation for ONNX export of Megatron Models by @asfiyab-nvidia :: PR: #6914
- FixTextMemMapDataset index file creation in multi-node setup by @gheinrich :: PR: #6768
- Fix flash-attention by @hsiehjackson :: PR: #6901
- ptuning oom fix by @arendu :: PR: #6916
- add rampup bs assertion by @dimapihtar :: PR: #6927
- Enable methods in bert-like models by @sararb :: PR: #6898
- support value attribution condition by @yidong72 :: PR: #6934
- Add missing save restore connector to eval scripts by @titu1994 :: PR: #6935
- Merge release r1.19.0 into main by @ericharper :: PR: #6948
- Stop at the stop token by @yidong72 :: PR: #6957
- fixes for spellmapper by @bene-ges :: PR: #6994
- Fix tabular data text generation by @yidong72 :: PR: #7022
- fix pos id - hf update by @ekmb :: PR: #7075
- fix syntax error introduced in PR-7079 by @bene-ges :: PR: #7102
NeMo Tools
Changelog
- SDE unt lvl comparison by @Jorjeous :: PR: #6669
- hot fix SDE by @Jorjeous :: PR: #6897
Bugfixes
Changelog
- small Bugfix by @fayejf :: PR: #7079
- Fix caching bug in causal convolutions for cache-aware ASR models by @VahidooX :: PR: #7034
- Fix masking bug for TTS Aligner by @redoctopus :: PR: #6677
- [bugfix] avoid the random shuffle of phoneme and tone tokens. by @XuesongYang :: PR: #6855
- fix ptuning residuals bug by @arendu :: PR: #6866
- TE bug fix by @dimapihtar :: PR: #7027
- Update distopt API for coalesced NCCL calls by @timmoon10 :: PR: #6886
General Improvements
Changelog
- update batch size recommendation to min 32 for 43b by @Zhilin123 :: PR: #6675
- Make Note usage consistent in adapter_mixins.py by @BrianMcBrayer :: PR: #6678
- Update all invalid tree references to blobs for NeMo samples by @BrianMcBrayer :: PR: #6679
- Update README.rst about container by @fayejf :: PR: #6686
- karpnv/issues6690 by @karpnv :: PR: #6705
- Limit codeql scope by @titu1994 :: PR: #6710
- Not pinning Gradio version by @yidong72 :: PR: #6680
- preprocess squad in sft format by @arendu :: PR: #6727
- Fix Codeql config by @titu1994 :: PR: #6731
- Fix fastpitch test nightly by @hsiehjackson :: PR: #6730
- Lora/PEFT training script CI test by @arendu :: PR: #6664
- fixed decor to show messages only when the wrapped object is called. by @XuesongYang :: PR: #6793
- lora pp2 by @arendu :: PR: #6818
- Upperbound Numpy to < 1.24 by @titu1994 :: PR: #6829
- Fix typo in documentation by @Dounx :: PR: #6838
- NFA updates by @erastorgueva-nv :: PR: #6695
- Update container for import action by @ericharper :: PR: #6883
- removed some tests by @arendu :: PR: #6900
- Update container info in README.rst by @fayejf :: PR: #6913
- Removed optional optimize_for_inference by @borisfom :: PR: #6933
- Update core commit for CI by @aklife97 :: PR: #6939
- lora inference ci by @arendu :: PR: #6931
- Upgrade base pytorch container to 23.06 by @ericharper :: PR: #6938
- Fix requirements for pydantic + inflect by @titu1994 :: PR: #6956
- Remove pyyaml by @titu1994 :: PR: #7052
- Fix links in Segmentation tutorial by @ekmb :: PR: #7117
- Update evaluator.py by @stevehuang52 :: PR: #7151
1、 asset-post-2023-08-forced-alignment-alignment_slots.png 43.82KB
2、 asset-post-2023-08-forced-alignment-allowed_seq_ctc.png 56.06KB
3、 asset-post-2023-08-forced-alignment-alowed_seq.png 17.7KB
4、 asset-post-2023-08-forced-alignment-asr_model.png 90.02KB
5、 asset-post-2023-08-forced-alignment-butter_betty_bought_words_aligned.mp4 1.68MB
6、 asset-post-2023-08-forced-alignment-ctc_trellis.png 127.39KB
7、 asset-post-2023-08-forced-alignment-ctc_viterbi_rule.png 225.73KB
8、 asset-post-2023-08-forced-alignment-fold_viterbi.mp4 654.72KB
9、 asset-post-2023-08-forced-alignment-naive_graph.mp4 712.72KB
10、 asset-post-2023-08-forced-alignment-redundancy_explain.mp4 654.74KB
11、 asset-post-2023-08-forced-alignment-redundancy_start_to_end.mp4 2.43MB
12、 asset-post-2023-08-forced-alignment-viterbi_rule.png 123.95KB
13、 asset-post-2023-08-forced-alignment-what_is_alignment.png 64.71KB
14、 asset-post-2023-10-28-numba-fp16-memory_joint.png 559.48KB
15、 asset-post-2023-10-28-numba-fp16-rnnt_joint.png 33.71KB
16、 nfa_forced_alignment_pipeline.png 129.34KB
17、 nfa_run.png 90.16KB
18、 nfa_word_segment_alignments.png 133.03KB