v1.20.0

NVIDIA/NeMo

版本发布时间: 2023-08-05 03:50:15

NVIDIA/NeMo最新发布版本:r2.0.0rc1(2024-08-16 05:55:14)

Highlights

Models

STT En Fast Conformer CTC XXLarge - 1.2 B param Fast Conformer CTC
STT En Fast Conformer Transducer XXLarge - 1.2 B param Fast Conformer Transducer
STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer English
STT En Fast Conformer CTC XLarge - XLarge Fast Conformer CTC
STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer Transducer
STT En Fast Conformer CTC Large - Large Fast Conformer CTC
STT En Fast Conformer Transducer Large - Large Fast Conformer Transducer
STT It Fast Conformer Hybrid Large P&C - Large P&C Italian Fast Conformer
STT Ua Fast Conformer Hybrid Large P&C - Large Ukranian Fast Conformer

NeMo ASR

Graph-RNN-T #6168
WildCard-RNN-T #6168
Confidence Ensembles for ASR
Token-and-Duration Transducer (TDT) #6536
Spellchecking ASR #6179
Numba FP16 RNNT Loss #6991

NeMo TTS

TTS Adapter Customization
TTS Dataloader Framework

NeMo Framework

LoRA for T5 and mT5 #6612
Flash Attention integration #6666
Mosaic 7B compatibility
Models with LongContext (32K) #6666, #6687, #6773

NeMo Tools

Speech Data Explorer: Utterance level ASR model comparsion #6669
Speech Data Processor: Spanish P&C
NeMo Forced Aligner: Large sequence alignment + memory reduction #6695

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.06

Detailed Changelogs

ASR

Changelog

[ASR] Adding ssl config for fast-conformer by @krishnacpuvvada :: PR: #6672
Fix for interctc test random failure by @Kipok :: PR: #6644
sharded manifests docs by @bmwshop :: PR: #6751
[TTS] Implement new vocoder dataset by @rlangman :: PR: #6670
TDT model pull request by @hainan-xv :: PR: #6536
Spec aug fix by @tbartley94 :: PR: #6775
Support large inputs to Conformer and Fast Conformer by @bmwshop :: PR: #6556
sharded manifests updated docs by @bmwshop :: PR: #6833
added fc-xl, xxl and titanet-s models by @nithinraok :: PR: #6832
Multi-lookahead cache-aware streaming models by @VahidooX :: PR: #6711
Update transcribe_utils.py by @stevehuang52 :: PR: #6865
Fix k2 build topo helper by @artbataev :: PR: #6887
Fix transcribe_utils.py for hybrid models in partial transcribe mode by @stevehuang52 :: PR: #6899
Add hybrid model support to transcribe_speech_parallel.py by @stevehuang52 :: PR: #6906
Update Frame-VAD doc by @stevehuang52 :: PR: #6902
Make sure asr_model.change_attention_model is run if either cfg.model_path or cfg.pretrained_name is specified by @erastorgueva-nv :: PR: #6908
Update fvad doc by @stevehuang52 :: PR: #6920
Online Code Switching Dataset for ASR by @trias702 :: PR: #6579
Fix AN4 dataset links by @artbataev :: PR: #6926
Fix confidence ensembles RNNT logprobs selection logic for exclude_blank scenario by @KunalDhawan :: PR: #6937
Adding cache-aware streaming ASR checkpoints. by @VahidooX :: PR: #6940
Remove from metrics by @titu1994 :: PR: #6979
Hybrid conformer export by @borisfom :: PR: #6983
Cache handling without input tensors mutation by @borisfom :: PR: #6980
Fixing an issue with confidence ensembles by @Kipok :: PR: #6987
Add ASR with TTS Tutorial. Fix enhancer usage. by @artbataev :: PR: #6955
fix install_beamsearch_decoders.sh by @karpnv :: PR: #7019
Add support for Numba FP16 RNNT Loss (#6991) by @titu1994 :: PR: #7038
Fix typo and branch in tutorial by @artbataev :: PR: #7048
Refined export_config by @borisfom :: PR: #7053
Fix documentation for Numba by @titu1994 :: PR: #7065
Adding docs and models for multiple lookahead cache-aware ASR by @VahidooX :: PR: #7067
Add updated fc ctc and rnnt xxl models by @nithinraok :: PR: #7128
Update notebook branch by @ericharper :: PR: #7135
Fixed main and merging this to r1.20 by @tango4j :: PR: #7127
Fix default context size by @nithinraok :: PR: #7141
Fix incorrect embedding grads with distopt BF16 grad reductions by @timmoon10 :: PR: #6958

TTS

Changelog

[TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
[TTS] Add script for text preprocessing by @rlangman :: PR: #6541
[TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
[TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
[TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
[TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
[TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
[TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
[TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012

NLP / NMT

Changelog

minor fix for missing chat attr by @arendu :: PR: #6671
eval fix by @arendu :: PR: #6685
VP Fixes for converter + Config management by @titu1994 :: PR: #6698
lora notebook by @arendu :: PR: #6765
peft eval directly from ckpt by @arendu :: PR: #6785
GPT inference long context by @ekmb :: PR: #6687
Fix validation with drop_last=False by @mikolajblaz :: PR: #6704
fix spellmapper tutorial, change branch to main by @bene-ges :: PR: #6803
text_generation_utils memory reduction if no logprob needed by @yzhang123 :: PR: #6773
Add optional index mapping dir in mmap text datasets by @gheinrich :: PR: #6683
Add inference kv cache support for transformer TE path by @yen-shi :: PR: #6627
add reference to our paper by @bene-ges :: PR: #6821
added changes to ramp up bs by @dimapihtar :: PR: #6799
t5 lora tuning by @arendu :: PR: #6612
Added rouge monitoring support for T5 by @jubick1337 :: PR: #6737
GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention by @hsiehjackson :: PR: #6666
Import Enum for chatbot component by @ericharper :: PR: #6877
typo fix from #6666 by @arendu :: PR: #6882
removed unnecessary print by @dimapihtar :: PR: #6884
Fix destructor for delayed mmap dataset case by @mikolajblaz :: PR: #6703
Make Gradio library optional by @yidong72 :: PR: #6904
Fix fast-glu activation in change partitions by @hsiehjackson :: PR: #6909
Documentation for ONNX export of Megatron Models by @asfiyab-nvidia :: PR: #6914
FixTextMemMapDataset index file creation in multi-node setup by @gheinrich :: PR: #6768
Fix flash-attention by @hsiehjackson :: PR: #6901
ptuning oom fix by @arendu :: PR: #6916
add rampup bs assertion by @dimapihtar :: PR: #6927
Enable methods in bert-like models by @sararb :: PR: #6898
support value attribution condition by @yidong72 :: PR: #6934
Add missing save restore connector to eval scripts by @titu1994 :: PR: #6935
Merge release r1.19.0 into main by @ericharper :: PR: #6948
Stop at the stop token by @yidong72 :: PR: #6957
fixes for spellmapper by @bene-ges :: PR: #6994
Fix tabular data text generation by @yidong72 :: PR: #7022
fix pos id - hf update by @ekmb :: PR: #7075
fix syntax error introduced in PR-7079 by @bene-ges :: PR: #7102

NeMo Tools

Changelog

SDE unt lvl comparison by @Jorjeous :: PR: #6669
hot fix SDE by @Jorjeous :: PR: #6897

Bugfixes

Changelog

small Bugfix by @fayejf :: PR: #7079
Fix caching bug in causal convolutions for cache-aware ASR models by @VahidooX :: PR: #7034
Fix masking bug for TTS Aligner by @redoctopus :: PR: #6677
[bugfix] avoid the random shuffle of phoneme and tone tokens. by @XuesongYang :: PR: #6855
fix ptuning residuals bug by @arendu :: PR: #6866
TE bug fix by @dimapihtar :: PR: #7027
Update distopt API for coalesced NCCL calls by @timmoon10 :: PR: #6886

General Improvements

Changelog

update batch size recommendation to min 32 for 43b by @Zhilin123 :: PR: #6675
Make Note usage consistent in adapter_mixins.py by @BrianMcBrayer :: PR: #6678
Update all invalid tree references to blobs for NeMo samples by @BrianMcBrayer :: PR: #6679
Update README.rst about container by @fayejf :: PR: #6686
karpnv/issues6690 by @karpnv :: PR: #6705
Limit codeql scope by @titu1994 :: PR: #6710
Not pinning Gradio version by @yidong72 :: PR: #6680
preprocess squad in sft format by @arendu :: PR: #6727
Fix Codeql config by @titu1994 :: PR: #6731
Fix fastpitch test nightly by @hsiehjackson :: PR: #6730
Lora/PEFT training script CI test by @arendu :: PR: #6664
fixed decor to show messages only when the wrapped object is called. by @XuesongYang :: PR: #6793
lora pp2 by @arendu :: PR: #6818
Upperbound Numpy to < 1.24 by @titu1994 :: PR: #6829
Fix typo in documentation by @Dounx :: PR: #6838
NFA updates by @erastorgueva-nv :: PR: #6695
Update container for import action by @ericharper :: PR: #6883
removed some tests by @arendu :: PR: #6900
Update container info in README.rst by @fayejf :: PR: #6913
Removed optional optimize_for_inference by @borisfom :: PR: #6933
Update core commit for CI by @aklife97 :: PR: #6939
lora inference ci by @arendu :: PR: #6931
Upgrade base pytorch container to 23.06 by @ericharper :: PR: #6938
Fix requirements for pydantic + inflect by @titu1994 :: PR: #6956
Remove pyyaml by @titu1994 :: PR: #7052
Fix links in Segmentation tutorial by @ekmb :: PR: #7117
Update evaluator.py by @stevehuang52 :: PR: #7151