v.0.10.2
版本发布时间: 2021-09-01 09:07:20
espnet/espnet最新发布版本:v.202409(2024-10-01 14:28:01)
News
- Hubert training is now available!
- Try with
egs2/librispeech/ssl1
- Try with
- GAN-based TTS model is now available!
- Joint text2mel and vocoder training
- End-to-end text-to-wave model (VITS) training
- Try with
egs2/ljspeech/tts1
- Support
from_pretrained
function!
Please check the available pretrained models in espnet_model_zoo!# e.g. from espnet2.bin.asr_inference import Speech2Text asr = Speech2Text.from_pretrained("model_tag") from espnet2.bin.tts_inference import Text2Speech tts = Text2Speech.from_pretrained("model_tag") from espnet2.bin.enh_inference import SeparateSpeech enh = SeparateSpeech.from_pretrained("model_tag") from espnet2.bin.diar_inference import DiarizeSpeech diar = DiarizeSpeech.from_pretrained("model_tag")
New Features
- [New Features][ESPnet1] Intermediate CTC + Stochastic depth #3274 by @jaesong
- [New Features][ESPnet2] Add new trainer for GAN-based training #3436 by @kan-bayashi
- [New Features][ESPnet2][ASR] Add Hubert model in Espnet2/Refactor from #3458 #3512 by @Jzmo
- [New Features][ESPnet2][ASR] batch decode with k2 ctc #3433 by @glynpu
- [New Features][ESPnet2][ASR][SE] Support
from_pretrained
for ASR and ENH #3535 by @kan-bayashi - [New Features][ESPnet2][DIAR] Support
from_pretrained
for DIAR #3537 by @YushiUeda - [New Features][ESPnet2][SE] Adding portable speech enhancement scripts for other tasks #3487 by @Emrys365
- [New Features][ESPnet2][TTS] Add GAN-TTS task with VITS #3449 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support SID and LID inputs for TTS models #3490 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support
from_pretrained
function inText2Speech
#3532 by @kan-bayashi - [New Features][ESPnet2][TTS] Support
parallel_wavegan
vocoders intts_inference.py
#3513 by @kan-bayashi - [New Features][ESPnet2][TTS] Support joint training of text2mel and vocoder #3501 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support language ID input for espnet2 TTS #3489 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support speaker id input for TTS models #3452 by @kan-bayashi
Enhancement
- [Enhancement][ESPnet2][CTC segmentation][README] Fix CTC Segmentation #3500 by @shirayu
- [Enhancement][ESPnet2][TTS] Add VITS-related modules #3448 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add cython code for VITS #3483 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add joint training config example #3508 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add melgan module for joint training #3516 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add parallel wavegan module for joint training #3515 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add style melgan module for joint training #3517 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add vocoder modules related to VITS #3439 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Change Text2Speech class output format #3437 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Follow up of the support speaker id input #3453 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support cleaner option in phn converter util #3450 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support language id in VITS #3499 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support linear spectrogram #3438 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support new g2p functions for various languages #3463 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Update the TTS inference #3498 by @kan-bayashi
- [Enhancement][ESPnet2][SLU][README] Add support for intent classification on SLURP dataset #3482 by @siddhu001
- [Enhancement][ESPnet2][SLU][README] Add NLU post-encoder using Hugging Face Transformers #3410 by @akreal
Recipe
- [Recipe][ESPnet1][ASR] Mucs21 subtask1 #3376 by @sanket0211
- [Recipe][ESPnet2][ASR][README] Add Swahili ASR recipe #3485 by @akreal
- [Recipe][ESPnet2][ASR][README] Rename
swahili
recipe toiwslt21_low_resource
#3522 by @akreal - [Recipe][ESPnet2][DIAR][README] Modify ESPnet2 diarization recipe #3524 by @YushiUeda
- [Recipe][ESPnet2][ESPnet1][ASR] Espnet2 mucs_subtask2 #3415 by @bloodraven66
- [Recipe][ESPnet2][ESPnet1][ASR] mucs subtask1 #3417 by @bloodraven66
- [Recipe][ESPnet2][SE] Add Voicebank (vctk_noisy) script #3486 by @neillu23
- [Recipe][ESPnet2][TTS] Add missing configs for LibriTTS recipe #3455 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update VITS config comments and settings #3528 by @kan-bayashi
- [Recipe][ESPnet2][TTS] aishell3 dataset preparation #3505 by @actboy
- [Recipe][ESPnet2][TTS][README] Add CSS10 recipe for ESPnet2-TTS #3464 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add JtubeSpeech Recipe #3459 by @Takaaki-Saeki
- [Recipe][ESPnet2][TTS][README] Add SIWIS recipe #3460 by @takenori-y
- [Recipe][ESPnet2][TTS][README] TTS recipe for J-KAC corpus #3468 by @TanUkkii007
- [Recipe][ESPnet2][TTS][README] TTS recipes for thchs30 and aishell3 #3470 by @ftshijt
- [Recipe][ESPnet2][TTS][README] Update JMD README #3531 by @takenori-y
- [Recipe][ESPnet2][TTS][README] Update SIWIS README #3509 by @takenori-y
- [Recipe][ESPnet2][SLU][README] Predict ASR transcript along with Intent for SLU #3480 by @siddhu001
- [Recipe][ESPnet2][SLU][README] Update SWBD DA configuration #3425 by @akreal
Bugfix
- [Bugfix][ESPnet2] Add return_complex=False for stft #3476 by @D-X-Y
- [Bugfix][ESPnet2] Dynamic import for the ngram function #3420 by @ftshijt
- [Bugfix][ESPnet2][README][Recipe] Add the GigaSpeech normalization and fix the WER #3519 by @chaisz19
- [Bugfix][ESPnet2][TTS] Add duration and focus_rate in output dict #3469 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Add missing symlink to trim_silence.py for ESPnet2 #3467 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Fix wrong arguments in pretrained vococder wrapper #3525 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Revert wrongly removed lines in
tts.sh
#3503 by @kan-bayashi - [Bugfix][ESPnet2][TTS][Typo] Fix typo in hifigan #3504 by @kan-bayashi
Refactoring
- [Refactoring][ESPnet1][ASR][RNNT][README] Transducer v5 #3217 by @b-flo
- [Refactoring][ESPnet2][SE][DIAR] Remove prefix
enh_
anddiar_
#3538 by @kan-bayashi - [Refactoring][ESPnet2][TTS] Refactor TTS modules in ESPnet2 #3497 by @kan-bayashi
- [Refactoring][ESPnet2][TTS] Remove the support of feats_type=fbank/stft in ESPnet2-TTS #3514 by @kan-bayashi
Others
- [CI] Fix k2 version in CI using conda #3493 by @kan-bayashi
- [CI] Fix test condition #3527 by @kan-bayashi
- [CI][Installation] Update Sentencepiece and add python 3.9 to CI #3422 by @shirayu
- [Docker] Docker Updates #3393 by @Fhrozen
- [Documentation] Update the tutorial about maxlenratio usage #3523 by @akreal
- [Documentation][ESPnet2][TTS] Update README.md #3502 by @kan-bayashi
- [Installation][README] Added a link and a classifier for Python 3.9 #3440 by @shirayu
- [Typo] Fix typos in "egs" #3447 by @shirayu
- [Typo][Documentation] Fix typos in "doc" #3441 by @shirayu
- [Typo][Documentation] Fix typos in "utils" #3442 by @shirayu
- [Typo][ESPnet1][MT] Fix typos in "espnet" #3444 by @shirayu
- [Typo][ESPnet2] Fix typos in "espnet2" #3443 by @shirayu
- [Typo][ESPnet2][README] Fix typos in "egs2" #3445 by @shirayu
Acknowledgements
Special thanks to @D-X-Y, @Emrys365, @Fhrozen, @Jzmo, @Takaaki-Saeki, @TanUkkii007, @YushiUeda, @actboy, @akreal, @b-flo, @bloodraven66, @chaisz19, @ftshijt, @glynpu, @jaesong, @kan-bayashi, @neillu23, @sanket0211, @shirayu, @siddhu001, @takenori-y.