MyGit

v1.22.0

NVIDIA/NeMo

版本发布时间: 2024-01-11 10:04:18

NVIDIA/NeMo最新发布版本:v2.0.0rc0(2024-06-06 13:46:45)

Highlights

Models

NeMo Parakeet

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/

NeMo Parakeet-TDT

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet-tdt/

ASR

NeMo ASR

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.10

Detailed Changelogs

ASR

Changelog
  • Fix missing pip package 'einops' by @RobinDong :: PR: #7397
  • Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
  • [ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
  • RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
  • Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
  • [TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
  • Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
  • Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
  • add fc large ls models by @nithinraok :: PR: #7641
  • [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
  • Create per.py by @ssh-meister :: PR: #7538
  • Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
  • [ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
  • Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
  • Replace gpus with devices by @athitten :: PR: #7743
  • docs: fix typos by @shuoer86 :: PR: #7758
  • Snake act by @nithinraok :: PR: #7736
  • fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
  • Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
  • remove TN from ctc_segm tut by @ekmb :: PR: #7807
  • Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
  • Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
  • Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
  • [ASR] GSS-based mask estimator by @anteju :: PR: #7849
  • add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
  • Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
  • update branch name by @nithinraok :: PR: #7990
  • fix librosa display issue by @nithinraok :: PR: #7991
  • Fixes Notebooks for ASR by @titu1994 :: PR: #7994
  • cherry pick bug 4405781 by @karpnv :: PR: #8044
  • fix noise augmentation by @stevehuang52 :: PR: #8056
  • Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
  • run with non-dev option by @nithinraok :: PR: #8077
  • update broken links by @nithinraok :: PR: #8079
  • langid bug fix by @karpnv :: PR: #8134

TTS

Changelog
  • Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
  • Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
  • Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
  • [TTS] Fix audio codec type checks by @rlangman :: PR: #7373
  • [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
  • Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
  • Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
  • [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
  • add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
  • Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
  • add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
  • [TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
  • Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
  • Group-residual vector quantizer by @anteju :: PR: #7643
  • French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
  • add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
  • Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
  • ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
  • Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
  • [Codec] Update codec checkpoint config by @anteju :: PR: #7835
  • [Codec] Finite scalar quantizer by @anteju :: PR: #7886
  • Tar codec by @nithinraok :: PR: #7867

LLM

Changelog
  • Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
  • Add comprehensive error messages by @PeganovAnton :: PR: #7261
  • layer selection for ia3 by @arendu :: PR: #7417
  • Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
  • Fix sft dataset truncation by @hsiehjackson :: PR: #7464
  • fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
  • Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
  • SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
  • remove auto generated examples by @arendu :: PR: #7510
  • Add the argument to by @odelalleau :: PR: #7264
  • PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
  • fix a typo by @BestJuly :: PR: #7496
  • StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
  • fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
  • generalized chat sft prompt by @yidong72 :: PR: #7655
  • Set base frequency from config by @shan18 :: PR: #7734
  • Megatron LLM documentation updates by @ssh-meister :: PR: #7400
  • Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
  • Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
  • set context for text memmap to fork by @arendu :: PR: #7784
  • Support flash decoding by @hsiehjackson :: PR: #7744
  • update text server to support compute logprobs by @Zhilin123 :: PR: #7733
  • Revert PEFT eval fix by @ericharper :: PR: #7693
  • Fix tn duplex by @ekmb :: PR: #7808
  • Multimodal merge by @yaoyu-33 :: PR: #7728
  • Fix flash decoding precision by @hsiehjackson :: PR: #7852
  • Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
  • adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
  • Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
  • Add back import guard by @cuichenx :: PR: #7882
  • Change FP8 Defaults by @cuichenx :: PR: #7894
  • Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
  • Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
  • Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
  • upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
  • added missing torch import by @Davood-M :: PR: #7913
  • Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
  • Fix pinned triton version by @hsiehjackson :: PR: #7925
  • fix tp_overlap config var name by @xrennvidia :: PR: #7928
  • only enable query key scaling during fp16 by @gshennvm :: PR: #7946
  • Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
  • Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
  • Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061

General Improvements

Changelog
  • Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
  • SDE Tutorial minor fix by @Jorjeous :: PR: #7598
  • Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
  • Karpnv/issue 7320 by @karpnv :: PR: #7418
  • Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
  • Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
  • HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
  • [doc] fix broken link by @stas00 :: PR: #7481
  • dllogger - log on rank 0 only by @stas00 :: PR: #7513
  • Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
  • defaults changed by @arendu :: PR: #7600
  • Bound transformers version in requirements by @athitten :: PR: #7620
  • Fix import error no module name model_utils by @menon92 :: PR: #7629
  • Fix in the confidence ensemble test by @Kipok :: PR: #7682
  • move core install to /workspace by @aklife97 :: PR: #7706
  • distributed checkpoint average script by @yidong72 :: PR: #7721
  • fix hybrid eval by @karpnv :: PR: #7757
  • fix(diarization-README): typo by @jqueguiner :: PR: #7771
  • Configure MCore logger by @mikolajblaz :: PR: #7781
  • Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
  • [Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
  • add guard if its a distributed checkpoint by @gshennvm :: PR: #7845
  • Update transformers cache on Jenkins by @ericharper :: PR: #7854
  • Update README.rst for container update by @fayejf :: PR: #7844
  • Fix mcore conversion bug by @cuichenx :: PR: #7846
  • add comment on script and fix target check by @gshennvm :: PR: #7881
  • fix issues with convert_nemo_llama_to_hf.py by @Zhilin123 :: PR: #7922
  • Instructions for running ci on pr template by @ericharper :: PR: #7944
  • Distributed checkpoint averaging supports bf16 type by @yidong72 :: PR: #7888
  • Fix tokenizer argparse in scripts by @titu1994 :: PR: #8012
  • Check dependencies in installation script by @artbataev :: PR: #8019
  • [SE Tutorial] USe GPU for inference, when available by @anteju :: PR: #8048
  • update reqs by @ericharper :: PR: #8072
  • Remove typo by @ericharper :: PR: #8146

相关地址:原始地址 下载(tar) 下载(zip)

1、 asset-post-v1.22.0-canary_asr.png 14.08KB

2、 asset-post-v1.22.0-canary_ast_enX.png 11.14KB

3、 asset-post-v1.22.0-canary_ast_Xen.png 15.16KB

4、 asset-post-v1.22.0-canary_gradio_video_demo_v5_volume3x.mp4 2.56MB

5、 asset-post-v1.22.0-ctcws.png 199.2KB

6、 asset-post-v1.22.0-ctcws_results.png 351.95KB

7、 asset-post-v1.22.0-ctcws_scheme_1.png 209.31KB

8、 asset-post-v1.22.0-ctcws_scheme_2.png 187.13KB

9、 asset-post-v1.22.0-leaderboard.png 657.26KB

10、 asset-post-v1.22.0-rnnt_topo.png 31.25KB

11、 asset-post-v1.22.0-tdt_topo.png 33.15KB

12、 nemo_audio_codec.png 151.92KB

13、 ssb.png 123.28KB

查看:2024-01-11发行的版本