Releases: espnet/espnet
ESPnet version 202412
New Features
Enhancement
- [Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki
Recipe
- [Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
- [Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
- [Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
- [Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz
Bugfix
- [Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
- [Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
- [Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
- [Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
- [Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen
Documentation
- [Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
- [Documentation] Update CI reference #5939 by @emmanuel-ferdman
- [Documentation] fix: collcate_fn -> collate_fn #5925 by @kalvinchang
- [Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon
Others
- [Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
- [Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
- [Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay
Acknowledgements
Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.
ESPnet version 202409
New Features
- [New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
- [New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
- [New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
- [New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
- [New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
- [New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
- [New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti
Enhancement
- [Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
- [Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
- [Enhancement][ESPnet2][ESPnet1] TransformerDecoder forward_one_step with memory_mask #5679 by @albertz
- [Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712
Recipe
- [Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
- [Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
- [Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
- [Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
- [Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
- [Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
- [Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
- [Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
- [Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
- [Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
- [Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
- [Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
- [Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
- [Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
- [Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
- [Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
- [Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
- [Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
- [Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
- [Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
- [Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
- [Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
- [Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
- [Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
- [Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
- [Recipe][ESPnet2][ASR] add interspeech2024_dsu_challenge/asr2 #5627 by @simpleoier
- [Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt
Bugfix
- [Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
- [Bugfix] Bugfix/homepage #5885 by @Masao-Someki
- [Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
- [Bugfix] Bug fix for source link #5883 by @Masao-Someki
- [Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
- [Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
- [Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
- [Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
- [Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
- [Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
- [Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
- [Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
- [Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
- [Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
- [Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
- [Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
- [Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
- [Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
- [Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
- [Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
- [Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
- [Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
- [Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo
Documentation
- [Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
- [Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
- [Documentation] Add script to make release note from milestone #5653 by @kan-bayashi
Refactoring
- [Refactoring] Modified easy to ez #5719 by @Masao-Someki
Others
- [Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
- [Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
- [Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
- [Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
- [Others] Update README info #5852 by @ftshijt
- [Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
- [Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
- [Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
- [Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
- [Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
- [Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
- [Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
- [Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
- [Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
- [Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
- [Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
- [Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
- [Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
- [Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
- [Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
- [Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba
Acknowledgements
Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.
ESPnet version 202402
News
We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez
and ESPnet-SPK
!
New Features
- [New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
- [New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
- [New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
- [New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
- [New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
- [New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki
Enhancement
- [Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
- [Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
- [Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
- [Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
- [Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt
Recipe
- [Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
- [Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
- [Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
- [Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
- [Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
- [Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
- [Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
- [Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
- [Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
- [Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
- [Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
- [Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
- [Recipe][ESPnet1] Added clean speech results #5649 by @linan2
- [Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
- [Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
- [Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
- [Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001
Bugfix
- [Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
- [Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
- [Bugfix] Fix CI #5642 by @siddhu001
- [Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
- [Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
- [Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
- [Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
- [Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
- [Bugfix][ESPnet2] Fix except #5567 by @takenori-y
- [Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
- [Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
- [Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
- [Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
- [Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso
Documentation
- [Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
- [Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23
Others
- [Others] Update Discord Invitation Link #5578 by @Fhrozen
- [Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.
ESPnet version 202310
What's Changed
- Support arbitrary language finetune for Whisper models. by @pengchengguo in #5344
- Update Dipco Data URL by @Fhrozen in #5391
- Update readme in TEMPLATE/svs1 by @linyueqian in #5394
- add gramvaani asr recipe by @bloodraven66 in #5366
- ESPnet-SPK: sampler by @Jungjee in #5365
- Adding general data augmentation methods for speech preprocessing by @Emrys365 in #5370
- Update of several SE recipes and some minor fixes by @Emrys365 in #5401
- Reproducing MIMOIRIS by @YoshikiMas in #5409
- Kathbath asr by @bloodraven66 in #5369
- Add pytorch2.0.1 to CI by @kamo-naoyuki in #5413
- [skip ci] Update README.md by @kamo-naoyuki in #5417
- In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in #5416
- Docker updates for local builds by @Fhrozen in #5406
- fix typo in TEMPLATE/svs1/README.md by @linyueqian in #5426
- Update install_mwerSegmenter.sh by @sw005320 in #5437
- Support Whisper-style training as a new task S2T by @pyf98 in #5120
- fix twice numpy installation issue by @kan-bayashi in #5447
- Add Whisper SOT recipe for Librimix by @LiChenda in #5371
- Update for the JOSS paper editor review by @neillu23 in #5418
- Add the VOiCES recipe for ASR by @Emrys365 in #5448
- Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in #5445
- [WIP] create recipe for acesinger by @linyueqian in #5431
- Add BibleTTS recipe by @wyh2000 in #5436
- ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in #5434
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5427
- Simple fix to reduce test_slu_inference time by @siddhu001 in #5460
- Do not use root logger in Beamsearch by @vsd-vector in #5454
- Fix whisper test by @siddhu001 in #5464
- Add doc for OWSM by @pyf98 in #5463
- Speech-to-speech translation Task by @ftshijt in #4859
- AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in #5456
- Support LoRA based large model finetuning. by @pengchengguo in #5400
- Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in #5323
- Add phonemized LibriTTS ASR recipe by @akreal in #5466
- Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in #5414
- speed up TFGridNet code by @zqwang7 in #5395
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5468
- ASR2 recipe on Tedlium3 dataset by @kohei0209 in #5331
- Create README.md in OWSM v1 by @pyf98 in #5489
- Update setup.py by @sw005320 in #5490
- Fix default value in ML-SUPERB by @ftshijt in #5492
- Fix bugs of Whisper SOT. by @pengchengguo in #5494
- Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in #5441
- Add a new SE recipe combining five public corpora by @Emrys365 in #5484
- Update .mergify.yml by @kamo-naoyuki in #5502
- update version to 202310 by @kan-bayashi in #5501
New Contributors
- @linyueqian made their first contribution in #5394
- @mdecerbo made their first contribution in #5416
- @zuazo made their first contribution in #5445
- @wyh2000 made their first contribution in #5436
- @yichen14 made their first contribution in #5434
- @vsd-vector made their first contribution in #5454
- @ms-dot-k made their first contribution in #5456
- @juice500ml made their first contribution in #5323
- @kohei0209 made their first contribution in #5331
Full Changelog: v.202308...v.202310
ESPnet version 202308
What's Changed
- Update tutorial by @ftshijt in #4648
- Update tutorials by @ftshijt in #4898
- add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in #5130
- Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in #5162
- [SVS] Add new recipes by @A-Quarter-Mile in #5158
- Update README.md of CHiME-7 DASR: fixing typos by @popcornell in #5166
- Fix typo in CONTRIBUTING.md by @eltociear in #5167
- CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in #5168
- Update TD-SpeakerBeam by @Emrys365 in #5155
- Add pre-trained causal speech separation model and streaming demo by @LiChenda in #5172
- KSC recipe by @khassanoff in #5171
- [SVS] Add new recipe by @A-Quarter-Mile in #5173
- Update AphasiaBank Recipe by @tjysdsg in #5104
- fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in #5159
- Add installer for ParallelWaveGAN by @ftshijt in #4052
- [GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in #5123
- [SVS] Update docs README.md by @South-Twilight in #5178
- Update SVS README.md by @jerryuhoo in #5180
- Adding eendss models by @soumimaiti in #5157
- 2022fall new task tutorial by @ftshijt in #5186
- [SVS] Updates for recipes by @A-Quarter-Mile in #5187
- [GAN SVS] fix phoneme predictor by @jerryuhoo in #5188
- Update generate_librimix_sd.sh by @leepeiying in #5182
- Bug fix for #5195 by @YosukeHiguchi in #5196
- [SVS] Update on recipes by @A-Quarter-Mile in #5197
- Update preprocessor.py by @sw005320 in #5200
- Minor fixes for ML-SUPERB by @ftshijt in #5202
- Quick fix for whisper specaug by @siddhu001 in #5206
- espnet-spk data preparation part by @Jungjee in #5184
- Fix M4singer multi-spk recipe by @ftshijt in #5201
- Update Dataset link for mlsuperb by @ftshijt in #5216
- Fix bug when score_type is set to normal in ml_superb by @ftshijt in #5217
- Add new functions and fix some bugs in SE by @Emrys365 in #5193
- Update import order by @ftshijt in #5229
- Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in #5228
- Standalone Transducer v1.1 by @b-flo in #5140
- Small fixes for Transducer by @b-flo in #5247
- add asr2 task and librispeech recipe as an example. by @simpleoier in #5181
- fix norm compatibility in scale discriminator by @kan-bayashi in #5240
- CFSD, SECS metrics for TTS by @imdanboy in #5235
- Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in #5246
- Fix bugs in mfa_format.py by @G-Thor in #5223
- New features for SVS by @ftshijt in #5245
- re-fix norm compatibility in scale discriminator by @kan-bayashi in #5249
- add conv1d subsampling 3 and egs2/librispeech/asr2 wavlm_large_21 kmeans (1000/2000) results by @simpleoier in #5252
- Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in #5212
- Fix a bug in score script for ML-SUPERB by @ftshijt in #5254
- Refactor prep_segments in SVS by @jerryuhoo in #5210
- A minor fix for num_splits_ssl for training by @ftshijt in #5262
- [SVS] add singing tacotron by @A-Quarter-Mile in #5233
- Add script to use speaker averaged xvectors in TTS training by @G-Thor in #5244
- Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in #5267
- Some name update for ml-superb by @ftshijt in #5276
- Add support for K2 pruned transducer loss by @b-flo in #5268
- Fix Transducer doc by @b-flo in #5306
- Update installation.md by @kamo-naoyuki in #5291
- Update install_nkf.sh by @sw005320 in #5300
- Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in #5310
- Update README.md by @sw005320 in #5315
- Update setup.py by @sw005320 in #5316
- Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in #5251
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5294
- A few updates for asr2 and hubert by @simpleoier in #5285
- Add decode_options and hyp_cleaner in evaluate_whisper_inference by @pyf98 in #5272
- update pyworld version by @kan-bayashi in #5319
- fix a data preparation issue for librimix recipe. by @LiChenda in #5322
- Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in #5289
- fix the s3prl frontend gradient backprop bug, ensuring feature_grad_mult=1.0 by @simpleoier in #5297
- ESPNet-SPK part 2 - training by @Jungjee in #5258
- remove some tests in espnet1 integration test by @sw005320 in #5328
- Fix random segments by @iamanigeeit in #5274
- Skip CI for draft PR by @ftshijt in #5333
- Update cancel.yml by @kan-bayashi in #5334
- Update several SE recipes and bash scripts by @Emrys365 in #5327
- Add PULL_REQUEST_TEMPLATE.md by @kan-bayashi in #5340
- ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in #5314
- Minimize espnet2 integration test by @kan-bayashi in #5324
- PR Labels for CI control by @Fhrozen in #5320
- Split ci into several jobs by @kan-bayashi in #5343
- Update CONTRIBUTING.md by @sw005320 in #5335
- Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in #5341
- Fix documentation skip CI by @Fhrozen in #5351
- Update the usage by @sw005320 in #5349
- Docker Update by @Fhrozen in #5321
- Update installation.md by @sw005320 in #5348
- Fix doc condition by @kan-bayashi in #5355
- Update issue templates by @sw005320 in #5357
- Update Contribution.md by @Fhrozen in #5352
- Fix .mergify condition by @kan-bayashi in #5354
- Reduce ffmpeg installation time in ci by @kan-bayashi in #5356
- Update CI table by @kan-bayashi in #5359
- Clean workflow files by @kan-bayashi in #5360
- Couple of tweaks for asr2.sh for the HF hub upload by @akreal in #5362
- Update TEMPLATE_HF_Readme.md (fix bash typo) by @akreal in #5361
- Add discrete-token ASR for LibriSpeech 100h by @akreal in #5350
- Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in #5342
- Fix bug in ngram training in slu.sh by @siddhu001 in #5364
- Add musdb18 recipe for music source separation by @Emrys365 in #5338
- Bugfix: JETS CTCLoss by @imdanboy in #5288
- Check the value of
n_shift
==upsample_factor
in GAN_TTS by @i...
ESPnet version 202304
What's Changed
- Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in #4888
- Apply the latest black by @kamo-naoyuki in #4907
- Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in #4906
- How2 fix README, incorrect url by @roshansh-cmu in #4902
- standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in #4905
- Fix typo in lrs/README.md by @eltociear in #4911
- MSUPERB setting update by @ftshijt in #4913
- Update test_import.yaml to install numba by @kamo-naoyuki in #4918
- update pyopenjtalk version to 0.3.0 by @kan-bayashi in #4912
- CHiME-7 Task1 recipe by @popcornell in #4894
- Update CHiME-7 Task 1 README.md by @popcornell in #4920
- Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in #4922
- Add few shot subset for mSuperb multilingual setting by @guapaQAQ in #4923
- Fix existing bugs in the TSE task by @Emrys365 in #4915
- IAM OCR recipe updates by @kenzheng99 in #4927
- Fixing some issues with chime7-task1 baseline by @popcornell in #4925
- set default none decoder for ASR by @ftshijt in #4917
- Update inference and training setting for mSuperb multilingual model by @guapaQAQ in #4932
- Add E-Branchformer Transducer results by @pyf98 in #4933
- add tf-gridnet by @zqwang7 in #4864
- Fixes + Channel Selection for CHiME-7 Task by @popcornell in #4934
- fix extracted feature dummy generation by @roshansh-cmu in #4926
- Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in #4941
- CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in #4942
- Update show_asr_result.sh by @kamo-naoyuki in #4943
- CHiME-7 DASR correct development results by @popcornell in #4946
- Fix 'floordiv is deprecated' warnings by @fujimotos in #4945
- Added WSLII installation instruction by @sw005320 in #4949
- Update Muskits by @A-Quarter-Mile in #4931
- Set a longer time execution threshold for related failed time-outs CI by @ftshijt in #4962
- Modify data prep for mSUPERB multilingual by @guapaQAQ in #4965
- Add E-Branchformer results in some recipes by @pyf98 in #4958
- Add 'six' as a required Python module by @fujimotos in #4964
- add msuperb linguistic analysis by @hhhaaahhhaa in #4938
- Fix a 'ref_channel'-related issue in espnet2/bin/enh_inference.py by @Emrys365 in #4972
- Add E-Branchformer results in slurp_entity by @pyf98 in #4971
- Add Conformer and E-Branchformer results in fisher_spanish_callhome ASR by @pyf98 in #4976
- [SVS] Add Joint-training by @A-Quarter-Mile in #4977
- Update the chunk iterator for the TSE task by @Emrys365 in #4929
- update msuperb LID scoring script by @hhhaaahhhaa in #4979
- add multilingual+lid lid score generation by @hhhaaahhhaa in #4982
- Add python=3.10 to CI by @kamo-naoyuki in #4627
- LID score v2 by @hhhaaahhhaa in #4983
- Fix ci by @kamo-naoyuki in #4985
- Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in #4986
- Remove six by @kamo-naoyuki in #4988
- Modify format_wav_scp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in #4997
- Fix Whisper tokenizer CI error by @slSeanWU in #5004
- fix s3prl upstream attribute bug by @jwrh in #5003
- [Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in #4994
- Fix typeguard version by @silvanocerza in #5009
- Add .pre-commit-config.yaml by @kamo-naoyuki in #5011
- Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in #4998
- Modfiy .pre-commit-config.yaml by @kamo-naoyuki in #5012
- Modify .pre-commit-config.yaml by @kamo-naoyuki in #5014
- Modify .pre-commit-config.yaml by @kamo-naoyuki in #5015
- [Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in #5019
- Modify asr.sh by @kamo-naoyuki in #5020
- [SVS] Improve visinger by @jerryuhoo in #5022
- Use scripts/utils/print_args.sh instead of pyscripts/utils/print_args.py by @kamo-naoyuki in #5025
- Add docstring in extra_path.sh by @kamo-naoyuki in #5028
- Update installation.md by @kamo-naoyuki in #5029
- Update README.md by @kamo-naoyuki in #5030
- Update README.md by @kamo-naoyuki in #5031
- Change bc to python by @kamo-naoyuki in #5032
- Update tools/Makefile and path.sh by @kamo-naoyuki in #5027
- Fix for format_wav_scp.py by @kamo-naoyuki in #5038
- Add execute permission to install_ice_g2p.sh by @kamo-naoyuki in #5040
- Bug fix of #5025 by @kamo-naoyuki in #5039
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5041
- Update README.md by @kamo-naoyuki in #5042
- Update README.md by @kamo-naoyuki in #5043
- Update README.md by @kamo-naoyuki in #5045
- Fix in gen_task1_data.sh from CHiME7 by @boeddeker in #4953
- Update README.md by @eml914 in #5044
- Add installers/install_ffmpeg.sh by @kamo-naoyuki in #5046
- Fix broken links reported by #5048 by @ShigekiKarita in #5050
- fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in #4978
- Add miniconda in gitignore by @pyf98 in #5052
- CHiME-7 DASR fixes from participants feedback by @popcornell in #4999
- Fix the condition for maxlen warning in beam search by @pyf98 in #5055
- Fixed SQLalchemy version for MFA by @Fhrozen in #5059
- Support Multi-Blank Transducer in Espnet2 by @jctian98 in #4876
- Fix chime7 DASR task1 run.sh by @kamo-naoyuki in #5060
- CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in #5061
- Add test_format_wav_scp_sh.bats by @kamo-naoyuki in #5062
- Update documentation by @kamo-naoyuki in #5063
- Support SOT training on LibriMix data. by @pengchengguo in #4861
- Update check_install.py by @kamo-naoyuki in #5066
- Tedlium3 recipe by @Some-random in #5068
- Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in #5074
- Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in #5075
- Clean ML-SUPERB by @ftshijt in #5067
- CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in #5054
- Chime7-task1 diarization (updated results) by @popcornell in #5088
- Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in #5084
- [SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pu...
ESPnet version 202301
What's Changed
- Initialize VISinger branch by @ftshijt in #4683
- Update VISInger branch by @ftshijt in #4705
- Update UASR branch with latest ESPnet functions by @ftshijt in #4752
- Update uasr by @ftshijt in #4770
- Shell scripts for UASR processing by @ftshijt in #4769
- Uasr python scripts by @DongjiGao in #4791
- Update visinger by @ftshijt in #4818
- Update test_custom_transducer.py by @sw005320 in #4826
- Update asr.sh by @sw005320 in #4827
- Fixed pad mode for librosa.stft by @Masao-Someki in #4832
- Add E-Branchformer models in some recipes by @pyf98 in #4833
- Fix data prep in GigaSpeech by @pyf98 in #4836
- time sync decoding for asr by @brianyan918 in #4792
- Remove duplicated VOXFORGE in db.sh (line81 and line157) by @pyf98 in #4840
- Fix argument parsing for non_linguistic_symbols in asr.sh by @pyf98 in #4841
- Add a warning statement when the hypo length equals to the max out length. by @pengchengguo in #4843
- Add target speaker extraction (TSE) functions by @Emrys365 in #4823
- Multilingual superb by @ftshijt in #4824
- VISinger by @jerryuhoo in #4689
- Update VISInger to latest by @ftshijt in #4849
- VISinger for singing voice synthesis by @ftshijt in #4848
- Reduce word counts for ESPnet-SE++ Joss paper by @neillu23 in #4844
- Add E-Branchformer configs and models in ASR recipes by @pyf98 in #4837
- Address Muskits updates on README by @ftshijt in #4850
- Minor fix for MSUPERB recipe by @ftshijt in #4851
- Update for the latest changes in the draft (minor changes) by @neillu23 in #4852
- Add E-Branchformer results on Librispeech by @kkim-asapp in #4856
- Update hubert implementation. by @simpleoier in #4747
- VISinger unit test by @jerryuhoo in #4855
- Minor fix to commonvoice espnet1 by @ftshijt in #4862
- [WIP] Add S4 decoder in ESPnet2 by @m-koichi in #4845
- Update hubert feature and acknowledge information in related Readmes. by @simpleoier in #4863
- Generating MFA aligments by @Fhrozen in #4803
- [WIP] EURO uasr scripts by @DongjiGao in #4846
- Update README.md related to ASR architecture by @m-koichi in #4865
- Minor fix to librimix diar recipe by @ftshijt in #4867
- Add Full Whisper Model for Finetuning by @slSeanWU in #4793
- Add torchaudio version check for HuBERT pretraining by @simpleoier in #4872
- add k2 decoder related scripts for EURO by @DongjiGao in #4868
- EURO: small fix (temporarily remove support for nbest_rescoring) by @DongjiGao in #4875
- Add description for Whisper ASR in homepage readme by @slSeanWU in #4877
- Update README.md by @eltociear in #4879
- add explanations to text tokenizing related scripts and remove unused script by @DongjiGao in #4880
- update information about source and our modification for k2 related scripts by @DongjiGao in #4881
- AphasiaBank ASR recipe by @tjysdsg in #4860
- Multilingual SUPERB update by @ftshijt in #4878
- ESPnet Unsupervised ASR (EURO project) by @ftshijt in #4774
- Support ProDiff in TTS by @Fhrozen in #4808
- Add E-Branchformer for GigaSpeech by @pyf98 in #4882
- FLEURS - Auxillary CTC conditioning tasks by @wanchichen in #4756
- Add python 3.8 requirement for Whisper & update tests by @slSeanWU in #4891
- Update some ASR results in the main readme file by @pyf98 in #4883
- Add Conv2dSubsampling1 module and test it in AphasiaBank ASR recipe by @tjysdsg in #4892
- Support x-vector extractor based on RawNet by @Takaaki-Saeki in #4884
- single language track setups by @DanBerrebbi in #4895
- fixing bug deu1 by @DanBerrebbi in #4900
- Fix dataprep issues based on updated data release via Google form by @roshansh-cmu in #4899
- Add a new EGS2 recipe 'reazonspeech' by @fujimotos in #4885
- Update version to 202301 by @kan-bayashi in #4901
New Contributors
- @DongjiGao made their first contribution in #4791
- @jerryuhoo made their first contribution in #4689
- @m-koichi made their first contribution in #4845
- @fujimotos made their first contribution in #4885
Full Changelog: v.202211...v.202301
ESPnet version 202211
What's Changed
- Update muskits update by @ftshijt in #4616
- Muskit installation by @A-Quarter-Mile in #4617
- Sync Muskits branch with Master by @ftshijt in #4640
- Updates on Muskit Migration by @A-Quarter-Mile in #4631
- Update Muskits branch by @ftshijt in #4662
- Add stage 5 & stage 6 by @A-Quarter-Mile in #4649
- Muskit: rename & reorganize features by @A-Quarter-Mile in #4668
- Update Muskits branch by @ftshijt in #4671
- Muskits CI fixing by @ftshijt in #4672
- Muskits CI fix by @ftshijt in #4673
- Muskits - apply isort by @ftshijt in #4677
- Muskits CI fix by @ftshijt in #4678
- Muskit: Add tokenizer by @A-Quarter-Mile in #4676
- Muskits - various fix for CI test by @ftshijt in #4679
- Muskit: add recipe ofuton by @A-Quarter-Mile in #4681
- Muskits (CI fix) by @ftshijt in #4682
- Fix CI issue in muskits by @ftshijt in #4687
- Add dns_icassp22 Speech Enhancement Recipe by @slSeanWU in #4657
- Singing Voice Synthesis Task for ESPnet by @ftshijt in #4670
- Documentation of Tutorial and Muskits by @ftshijt in #4692
- Add tests on MacOS and Windows (only installation) by @kamo-naoyuki in #4669
- Add missing entries in readme by @ftshijt in #4699
- Support ST without texts in source language by @sophia1488 in #4688
- Update ConvInput for Transducer by @b-flo in #4720
- Small changes for standalone Transducer by @b-flo in #4722
- Fix input block tutorial documentation for Transducer by @b-flo in #4724
- Fix HF Pytest Errors by @siddhu001 in #4737
- Update to puebla-nahuatl recipe (some minor fixes) by @ftshijt in #4713
- Add espnet2 TTS recipe on M-AILABS by @Takaaki-Saeki in #4701
- Update outdated enh config files by @Emrys365 in #4719
- add src_sos & src_eos for mt task to address the index out of range w… by @simpleoier in #4736
- Add g2pk_explicit_space tokenizer by @jonghwanhyeon in #4718
- Fix JETS inference with GST (#4743) by @kan-bayashi in #4744
- Update on Muskit by @A-Quarter-Mile in #4700
- add fleurs conformer+sc-ctc results by @wanchichen in #4746
- Add recipe for OCR task on IAM handwriting dataset by @kenzheng99 in #4707
- Add Talromur2 recipe by @G-Thor in #4680
- Add multi-channel enh_asr for CHiME-4 by @YoshikiMas in #4706
- chunk_mask error by @aky15 in #4751
- fix wav2vec2 encoder mask bug by @simpleoier in #4772
- Add Hugging Face Transformers Decoder, Tokenizer and their example on SLURP by @akreal in #4099
- [Recipe PR] MELD: Multimodal EmotionLines Dataset by @realzza in #4771
- MultiIRIS follow up by @YoshikiMas in #4765
- Add CATSLU results for XLS-R with mBART-50 by @akreal in #4782
- Add MEDIA and PortMEDIA results for XLS-R with mBART-50 by @akreal in #4794
- Add SLUE-VoxPopuli results for WavLM with mBART-50 by @akreal in #4777
- Follow up for SLURP and CATSLU by @akreal in #4796
- Update README in chime4/enh_asr1 by @YoshikiMas in #4795
- fix parsing token_list by @imdanboy in #4778
- Use torchaudio functions for beamforming related operations in torch 1.12.1+ by @Emrys365 in #4638
- PIT E2E multi-speaker ASR and librimix recipe by @simpleoier in #4753
- Fix an audio format issue in some enh recipes by @YoshikiMas in #4799
- Fixing How2-2000h Data preparation and Seq Length Assert for Longformer Encoder by @roshansh-cmu in #4805
- Adding MFA scripts for LJSpeech by @iamanigeeit in #4801
- fix typo in espnet2_tutorial.md by @eltociear in #4811
- [WIP] E-Branchformer Encoder in ESPnet2 by @kkim-asapp in #4812
- Muskit update by @A-Quarter-Mile in #4783
New Contributors
- @A-Quarter-Mile made their first contribution in #4617
- @sophia1488 made their first contribution in #4688
- @kenzheng99 made their first contribution in #4707
- @realzza made their first contribution in #4771
- @iamanigeeit made their first contribution in #4801
- @eltociear made their first contribution in #4811
- @kkim-asapp made their first contribution in #4812
Full Changelog: v.202209...v.202211
ESPnet version 202209
What's Changed
- Add dynamic mixing in the speech separation task. by @LiChenda in #4387
- Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in #4560
- Offline/Online (standalone) ESPnet2 Transducer by @b-flo in #4479
- Unfix matplotlib version by @kamo-naoyuki in #4576
- use torch.finfo for dtype other than float by @wenzhe-nrv in #4584
- Update recipe for slurp-entity by @ftshijt in #4585
- Egs2 aesrc by @brianyan918 in #4592
- update checks for bias in initialization by @LiChenda in #4574
- [WIP] Update to fit the recent update in s3prl. by @simpleoier in #4593
- Unfix numpy version by @kamo-naoyuki in #4598
- Update to fit the recent update in s3prl. by @simpleoier in #4600
- Add improved results on FLEURS dataset by @wanchichen in #4596
- Update mp4_to_wav.sh by @jaehyun-ko in #4605
- Pass output_dir as str to wandb.init() by @jonghwanhyeon in #4607
- Support enh_s2t joint training on multi-speaker data by @Emrys365 in #4566
- Add ASR results for commonvoice zh_TW by @slSeanWU in #4612
- Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in #4609
- recipe config update by @ftshijt in #4621
- Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in #4604
- New SLU task by @siddhu001 in #4569
- Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in #4620
- Update conformer result of AMI corpus by @teinhonglo in #4629
- Offline/Online Branchformer Transducer by @b-flo in #4582
- Change to install numba using pip instead of conda by @kamo-naoyuki in #4637
- Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in #4619
- Add 2-pass SLU code for FSC Challenge by @siddhu001 in #4636
- CI fix and some other minor recipe fixes by @ftshijt in #4656
- Update the title of plots to be y-label vs x-label by @pyf98 in #4647
- Update VIVOS download link by @hieuthi in #4644
- Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in #4635
- Amend to CI fix by @ftshijt in #4663
- qasr update by @massabaali7 in #4642
- Open_li110 for large-scale multilingual speech by @ftshijt in #4408
- Fix the path of calculate_rft.py by @sw005320 in #4660
- Fix importlib-metadata version by @kan-bayashi in #4686
- Cmu arctic tts pretrain finetune by @soumimaiti in #4456
- updated version to 202209 by @kan-bayashi in #4685
New Contributors
- @wenzhe-nrv made their first contribution in #4584
- @jaehyun-ko made their first contribution in #4605
- @jonghwanhyeon made their first contribution in #4607
- @slSeanWU made their first contribution in #4612
- @massabaali7 made their first contribution in #4642
- @soumimaiti made their first contribution in #4456
Full Changelog: v.202207...v.202209
ESPnet version 202207
New Features
- [New Features][ESPnet1][ASR] Add DDP support for v1 ASR training. #4430 by @lazykyama
- [New Features][ESPnet2] Support tensorboard graph #4418 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Branchformer Encoder in ESPnet2 #4400 by @pyf98
- [New Features][ESPnet2][Diarization][SE] enh_diar joint model #4339 by @YushiUeda
- [New Features][ESPnet2][ESPnet1] Calculate RTF and latency in espnet2 #4382 by @espnetUser
- [New Features][ESPnet2][ESPnet1][SE] Add EnhPreprocessor for Speech Enhancement #4321 by @Emrys365
- [New Features][ESPnet2][SE] Add DPTNet and WarmupStepLR scheduler #4449 by @Emrys365
- [New Features][ESPnet2][SE] Add support for calculating losses on noise and dereverberated signals #4476 by @Emrys365
Recipe
- [Recipe][ESPnet2] Aishell-2 GPU info #4501 by @jctian98
- [Recipe][ESPnet2] Fix librispeech default path to signify auto download #4517 by @karthik19967829
- [Recipe][ESPnet2] Recipe fix for PueblaNahuatl Recipe #4522 by @ftshijt
- [Recipe][ESPnet2][ASR][README] Add Aishell-2 ASR Recipe for Espnet2 #4451 by @jctian98
- [Recipe][ESPnet2][ASR][README] Add AmericasNLP 2022 baselines #4428 by @akreal
- [Recipe][ESPnet2][ESPnet1][ASR][Installation] FLEURS ASR Recipe for ESPnet2 #4455 by @wanchichen
- [Recipe][ESPnet2][ESPnet1][ASR][README] tedx_spanish_corpus egs2 recipe #4523 by @jessicah25
- [Recipe][ESPnet2][ESPnet1][ASR][SE] Adding L3DAS22 Task1 model to ESPNet-SE #3994 by @popcornell
- [Recipe][ESPnet2][ESPnet1][ST] Must_C v1 and v2 in egs2 #4306 by @brianyan918
- [Recipe][ESPnet2][README] Dcase task1 Baseline #4317 by @siddhu001
- [Recipe][ESPnet2][README] Report Aishell-2 Transducer results #4489 by @jctian98
- [Recipe][ESPnet2][README] Update language codes in AmericasNLP 2022 baseline #4441 by @akreal
- [Recipe][ESPnet2][README] Vox populi baseline #4478 by @siddhu001
- [Recipe][ESPnet2][SE] L3DAS22 enhancement recipe #4269 by @neillu23
- [Recipe][ESPnet2][SE] Update notes in the recipes for DNS challenges #4433 by @YoshikiMas
- [Recipe][ESPnet2][SE][SLU][ST] LT-Spatialized and SLURP-Spatialized combined enhancement recipe #4268 by @neillu23
- [Recipe][ESPnet2][ST] Add moses check for ST recipes #4417 by @ftshijt
- [Recipe][ESPnet2][TTS] Add talromur recipe #4379 by @G-Thor
- [Recipe][ESPnet2][TTS] Fix for issue #4401 #4402 by @G-Thor
- [Recipe][ESPnet2][TTS] add pre-trained model jets in the recipe of ljspeech, kss #4406 by @imdanboy
Bugfix
- [Bugfix][ESPnet1] fix the corrupted pretrained model #4490 by @wentaoxandry
- [Bugfix][ESPnet1][ESPnet2] Fix an4 URL #4427 by @pyf98
- [Bugfix][ESPnet1][ESPnet2][RNNT] Fix mAES with big vocab size #4312 by @b-flo
- [Bugfix][ESPnet2] Adding init.py to espnet2/diar/layers and espnet2/diar/separator #4470 by @cycentum
- [Bugfix][ESPnet2] Fix tensorboard-graph creation for multi gpu mode #4431 by @kamo-naoyuki
- [Bugfix][ESPnet2] Update char_tokenizer.py #4499 by @xiabingquan
- [Bugfix][ESPnet2][ESPnet1][ASR][LM][MT][TTS] Fix Transducer LM fusion and add Logging for Transducer inference #4327 by @chintu619
- [Bugfix][ESPnet2][SE] Fix a bug in enh unit test #4435 by @Emrys365
Enhancement
- [Enhancement][ESPnet2] Optionize graph creation #4551 by @kan-bayashi
- [Enhancement][ESPnet2][Installation][TTS] Add icelandic g2p #4384 by @G-Thor
- [Enhancement][ESPnet2][SE] Add support of test-only criterions after each epoch #4381 by @Emrys365
- [Enhancement][ESPnet2][SSL] raise more useful error in espnet2/asr/frontend/s3prl.py if s3prl is not installed #4480 by @popcornell
- [Enhancement][ESPnet2][TTS] Add JETS AlignmentModule in calculate_all_attentions.py #4446 by @seastar105
Refactoring
- [Refactoring][ESPnet1] Refactoring 'is_prefix' function #4530 by @jhlee9010
- [Refactoring][ESPnet2][ASR] Zero_infinity option for ctc loss #4415 by @kamo-naoyuki
Others
- [CI][ESPnet1][ESPnet2][Installation] Remove the version restriction for numpy #4419 by @kamo-naoyuki
- [CI][ESPnet2] Canged to install espnet from wheel in the test_import CI test #4471 by @kamo-naoyuki
- [CI][Installation] Temporary fixed numpy version #4464 by @kamo-naoyuki
- [Documentation] Add notes on batch size and num of GPUs in ESPnet2 documentation #4436 by @pyf98
- [Documentation][ESPnet1] Update decoder.py #4322 by @sw005320
- [Documentation][ESPnet2] Add a note to follow the installation instructions #4477 by @akreal
Acknowledgements
Special thanks to @Emrys365, @G-Thor, @YoshikiMas, @YushiUeda, @akreal, @b-flo, @brianyan918, @chintu619, @cycentum, @espnetUser, @ftshijt, @imdanboy, @jctian98, @jessicah25, @jhlee9010, @kamo-naoyuki, @kan-bayashi, @karthik19967829, @lazykyama, @neillu23, @popcornell, @pyf98, @seastar105, @siddhu001, @sw005320, @wanchichen, @wentaoxandry, @xiabingquan.