v.0.10.2

版本发布时间: 2021-09-01 09:07:20

espnet/espnet最新发布版本:v.202409(2024-10-01 14:28:01)

News

Hubert training is now available!
- Try with egs2/librispeech/ssl1
GAN-based TTS model is now available!
- Joint text2mel and vocoder training
- End-to-end text-to-wave model (VITS) training
- Try with egs2/ljspeech/tts1

Support from_pretrained function!

# e.g.
from espnet2.bin.asr_inference import Speech2Text
asr = Speech2Text.from_pretrained("model_tag")

from espnet2.bin.tts_inference import Text2Speech
tts = Text2Speech.from_pretrained("model_tag")

from espnet2.bin.enh_inference import SeparateSpeech
enh = SeparateSpeech.from_pretrained("model_tag")

from espnet2.bin.diar_inference import DiarizeSpeech
diar = DiarizeSpeech.from_pretrained("model_tag")

Please check the available pretrained models in espnet_model_zoo!

New Features

[New Features][ESPnet1] Intermediate CTC + Stochastic depth #3274 by @jaesong
[New Features][ESPnet2] Add new trainer for GAN-based training #3436 by @kan-bayashi
[New Features][ESPnet2][ASR] Add Hubert model in Espnet2/Refactor from #3458 #3512 by @Jzmo
[New Features][ESPnet2][ASR] batch decode with k2 ctc #3433 by @glynpu
[New Features][ESPnet2][ASR][SE] Support from_pretrained for ASR and ENH #3535 by @kan-bayashi
[New Features][ESPnet2][DIAR] Support from_pretrained for DIAR #3537 by @YushiUeda
[New Features][ESPnet2][SE] Adding portable speech enhancement scripts for other tasks #3487 by @Emrys365
[New Features][ESPnet2][TTS] Add GAN-TTS task with VITS #3449 by @kan-bayashi
[New Features][ESPnet2][TTS] Support SID and LID inputs for TTS models #3490 by @kan-bayashi
[New Features][ESPnet2][TTS] Support from_pretrained function in Text2Speech #3532 by @kan-bayashi
[New Features][ESPnet2][TTS] Support parallel_wavegan vocoders in tts_inference.py #3513 by @kan-bayashi
[New Features][ESPnet2][TTS] Support joint training of text2mel and vocoder #3501 by @kan-bayashi
[New Features][ESPnet2][TTS] Support language ID input for espnet2 TTS #3489 by @kan-bayashi
[New Features][ESPnet2][TTS] Support speaker id input for TTS models #3452 by @kan-bayashi

Enhancement

[Enhancement][ESPnet2][CTC segmentation][README] Fix CTC Segmentation #3500 by @shirayu
[Enhancement][ESPnet2][TTS] Add VITS-related modules #3448 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add cython code for VITS #3483 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add joint training config example #3508 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add melgan module for joint training #3516 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add parallel wavegan module for joint training #3515 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add style melgan module for joint training #3517 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Add vocoder modules related to VITS #3439 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Change Text2Speech class output format #3437 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Follow up of the support speaker id input #3453 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Support cleaner option in phn converter util #3450 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Support language id in VITS #3499 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Support linear spectrogram #3438 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Support new g2p functions for various languages #3463 by @kan-bayashi
[Enhancement][ESPnet2][TTS] Update the TTS inference #3498 by @kan-bayashi
[Enhancement][ESPnet2][SLU][README] Add support for intent classification on SLURP dataset #3482 by @siddhu001
[Enhancement][ESPnet2][SLU][README] Add NLU post-encoder using Hugging Face Transformers #3410 by @akreal

Recipe

[Recipe][ESPnet1][ASR] Mucs21 subtask1 #3376 by @sanket0211
[Recipe][ESPnet2][ASR][README] Add Swahili ASR recipe #3485 by @akreal
[Recipe][ESPnet2][ASR][README] Rename swahili recipe to iwslt21_low_resource #3522 by @akreal
[Recipe][ESPnet2][DIAR][README] Modify ESPnet2 diarization recipe #3524 by @YushiUeda
[Recipe][ESPnet2][ESPnet1][ASR] Espnet2 mucs_subtask2 #3415 by @bloodraven66
[Recipe][ESPnet2][ESPnet1][ASR] mucs subtask1 #3417 by @bloodraven66
[Recipe][ESPnet2][SE] Add Voicebank (vctk_noisy) script #3486 by @neillu23
[Recipe][ESPnet2][TTS] Add missing configs for LibriTTS recipe #3455 by @kan-bayashi
[Recipe][ESPnet2][TTS] Update VITS config comments and settings #3528 by @kan-bayashi
[Recipe][ESPnet2][TTS] aishell3 dataset preparation #3505 by @actboy
[Recipe][ESPnet2][TTS][README] Add CSS10 recipe for ESPnet2-TTS #3464 by @kan-bayashi
[Recipe][ESPnet2][TTS][README] Add JtubeSpeech Recipe #3459 by @Takaaki-Saeki
[Recipe][ESPnet2][TTS][README] Add SIWIS recipe #3460 by @takenori-y
[Recipe][ESPnet2][TTS][README] TTS recipe for J-KAC corpus #3468 by @TanUkkii007
[Recipe][ESPnet2][TTS][README] TTS recipes for thchs30 and aishell3 #3470 by @ftshijt
[Recipe][ESPnet2][TTS][README] Update JMD README #3531 by @takenori-y
[Recipe][ESPnet2][TTS][README] Update SIWIS README #3509 by @takenori-y
[Recipe][ESPnet2][SLU][README] Predict ASR transcript along with Intent for SLU #3480 by @siddhu001
[Recipe][ESPnet2][SLU][README] Update SWBD DA configuration #3425 by @akreal

Bugfix

[Bugfix][ESPnet2] Add return_complex=False for stft #3476 by @D-X-Y
[Bugfix][ESPnet2] Dynamic import for the ngram function #3420 by @ftshijt
[Bugfix][ESPnet2][README][Recipe] Add the GigaSpeech normalization and fix the WER #3519 by @chaisz19
[Bugfix][ESPnet2][TTS] Add duration and focus_rate in output dict #3469 by @kan-bayashi
[Bugfix][ESPnet2][TTS] Add missing symlink to trim_silence.py for ESPnet2 #3467 by @kan-bayashi
[Bugfix][ESPnet2][TTS] Fix wrong arguments in pretrained vococder wrapper #3525 by @kan-bayashi
[Bugfix][ESPnet2][TTS] Revert wrongly removed lines in tts.sh #3503 by @kan-bayashi
[Bugfix][ESPnet2][TTS][Typo] Fix typo in hifigan #3504 by @kan-bayashi

Refactoring

[Refactoring][ESPnet1][ASR][RNNT][README] Transducer v5 #3217 by @b-flo
[Refactoring][ESPnet2][SE][DIAR] Remove prefix enh_ and diar_ #3538 by @kan-bayashi
[Refactoring][ESPnet2][TTS] Refactor TTS modules in ESPnet2 #3497 by @kan-bayashi
[Refactoring][ESPnet2][TTS] Remove the support of feats_type=fbank/stft in ESPnet2-TTS #3514 by @kan-bayashi

Others

[CI] Fix k2 version in CI using conda #3493 by @kan-bayashi
[CI] Fix test condition #3527 by @kan-bayashi
[CI][Installation] Update Sentencepiece and add python 3.9 to CI #3422 by @shirayu
[Docker] Docker Updates #3393 by @Fhrozen
[Documentation] Update the tutorial about maxlenratio usage #3523 by @akreal
[Documentation][ESPnet2][TTS] Update README.md #3502 by @kan-bayashi
[Installation][README] Added a link and a classifier for Python 3.9 #3440 by @shirayu
[Typo] Fix typos in "egs" #3447 by @shirayu
[Typo][Documentation] Fix typos in "doc" #3441 by @shirayu
[Typo][Documentation] Fix typos in "utils" #3442 by @shirayu
[Typo][ESPnet1][MT] Fix typos in "espnet" #3444 by @shirayu
[Typo][ESPnet2] Fix typos in "espnet2" #3443 by @shirayu
[Typo][ESPnet2][README] Fix typos in "egs2" #3445 by @shirayu

Acknowledgements

Special thanks to @D-X-Y, @Emrys365, @Fhrozen, @Jzmo, @Takaaki-Saeki, @TanUkkii007, @YushiUeda, @actboy, @akreal, @b-flo, @bloodraven66, @chaisz19, @ftshijt, @glynpu, @jaesong, @kan-bayashi, @neillu23, @sanket0211, @shirayu, @siddhu001, @takenori-y.

相关地址：原始地址下载(tar) 下载(zip)

查看：2021-09-01发行的版本