New Features

@RayYuki

New Features

[New Features][ESPnet2][Codec] Add HiFiCodec model #5898 by @RayYuki

Enhancement

[Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki

Recipe

[Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
[Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
[Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
[Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz

Bugfix

[Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
[Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
[Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
[Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
[Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen

Documentation

[Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
[Documentation] Update CI reference #5939 by @emmanuel-ferdman
[Documentation] fix: collcate_fn -> collate_fn #5925 by @kalvinchang
[Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon

Others

[Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
[Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
[Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay

Acknowledgements

Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.

@wyh2000

New Features

[New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
[New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
[New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
[New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
[New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
[New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
[New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti

Enhancement

[Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
[Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
[Enhancement][ESPnet2][ESPnet1] TransformerDecoder forward_one_step with memory_mask #5679 by @albertz
[Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712

Recipe

[Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
[Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
[Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
[Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
[Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
[Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
[Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
[Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
[Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
[Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
[Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
[Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
[Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
[Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
[Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
[Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
[Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
[Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
[Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
[Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
[Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
[Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
[Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
[Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
[Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
[Recipe][ESPnet2][ASR] add interspeech2024_dsu_challenge/asr2 #5627 by @simpleoier
[Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt

Bugfix

[Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
[Bugfix] Bugfix/homepage #5885 by @Masao-Someki
[Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
[Bugfix] Bug fix for source link #5883 by @Masao-Someki
[Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
[Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
[Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
[Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
[Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
[Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
[Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
[Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
[Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
[Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
[Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
[Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
[Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
[Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
[Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
[Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
[Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
[Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
[Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo

Documentation

[Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
[Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
[Documentation] Add script to make release note from milestone #5653 by @kan-bayashi

Refactoring

[Refactoring] Modified easy to ez #5719 by @Masao-Someki

Others

[Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
[Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
[Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
[Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
[Others] Update README info #5852 by @ftshijt
[Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
[Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
[Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
[Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
[Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
[Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
[Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
[Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
[Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
[Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
[Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
[Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
[Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
[Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
[Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
[Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba

Acknowledgements

Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.

@LiChenda

News

We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!

New Features

[New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
[New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
[New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
[New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
[New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
[New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki

Enhancement

[Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
[Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
[Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
[Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
[Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt

Recipe

[Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
[Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
[Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
[Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
[Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
[Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
[Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
[Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
[Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
[Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
[Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
[Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
[Recipe][ESPnet1] Added clean speech results #5649 by @linan2
[Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
[Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
[Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
[Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001

Bugfix

[Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
[Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
[Bugfix] Fix CI #5642 by @siddhu001
[Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
[Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
[Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
[Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
[Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
[Bugfix][ESPnet2] Fix except #5567 by @takenori-y
[Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
[Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
[Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
[Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
[Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso

Documentation

[Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
[Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23

Others

[Others] Update Discord Invitation Link #5578 by @Fhrozen
[Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.

@pengchengguo

What's Changed

Support arbitrary language finetune for Whisper models. by @pengchengguo in #5344
Update Dipco Data URL by @Fhrozen in #5391
Update readme in TEMPLATE/svs1 by @linyueqian in #5394
add gramvaani asr recipe by @bloodraven66 in #5366
ESPnet-SPK: sampler by @Jungjee in #5365
Adding general data augmentation methods for speech preprocessing by @Emrys365 in #5370
Update of several SE recipes and some minor fixes by @Emrys365 in #5401
Reproducing MIMOIRIS by @YoshikiMas in #5409
Kathbath asr by @bloodraven66 in #5369
Add pytorch2.0.1 to CI by @kamo-naoyuki in #5413
[skip ci] Update README.md by @kamo-naoyuki in #5417
In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in #5416
Docker updates for local builds by @Fhrozen in #5406
fix typo in TEMPLATE/svs1/README.md by @linyueqian in #5426
Update install_mwerSegmenter.sh by @sw005320 in #5437
Support Whisper-style training as a new task S2T by @pyf98 in #5120
fix twice numpy installation issue by @kan-bayashi in #5447
Add Whisper SOT recipe for Librimix by @LiChenda in #5371
Update for the JOSS paper editor review by @neillu23 in #5418
Add the VOiCES recipe for ASR by @Emrys365 in #5448
Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in #5445
[WIP] create recipe for acesinger by @linyueqian in #5431
Add BibleTTS recipe by @wyh2000 in #5436
ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in #5434
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5427
Simple fix to reduce test_slu_inference time by @siddhu001 in #5460
Do not use root logger in Beamsearch by @vsd-vector in #5454
Fix whisper test by @siddhu001 in #5464
Add doc for OWSM by @pyf98 in #5463
Speech-to-speech translation Task by @ftshijt in #4859
AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in #5456
Support LoRA based large model finetuning. by @pengchengguo in #5400
Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in #5323
Add phonemized LibriTTS ASR recipe by @akreal in #5466
Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in #5414
speed up TFGridNet code by @zqwang7 in #5395
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5468
ASR2 recipe on Tedlium3 dataset by @kohei0209 in #5331
Create README.md in OWSM v1 by @pyf98 in #5489
Update setup.py by @sw005320 in #5490
Fix default value in ML-SUPERB by @ftshijt in #5492
Fix bugs of Whisper SOT. by @pengchengguo in #5494
Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in #5441
Add a new SE recipe combining five public corpora by @Emrys365 in #5484
Update .mergify.yml by @kamo-naoyuki in #5502
update version to 202310 by @kan-bayashi in #5501

New Contributors

@linyueqian made their first contribution in #5394
@mdecerbo made their first contribution in #5416
@zuazo made their first contribution in #5445
@wyh2000 made their first contribution in #5436
@yichen14 made their first contribution in #5434
@vsd-vector made their first contribution in #5454
@ms-dot-k made their first contribution in #5456
@juice500ml made their first contribution in #5323
@kohei0209 made their first contribution in #5331

Full Changelog: v.202308...v.202310

@ftshijt

What's Changed

Update tutorial by @ftshijt in #4648
Update tutorials by @ftshijt in #4898
add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in #5130
Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in #5162
[SVS] Add new recipes by @A-Quarter-Mile in #5158
Update README.md of CHiME-7 DASR: fixing typos by @popcornell in #5166
Fix typo in CONTRIBUTING.md by @eltociear in #5167
CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in #5168
Update TD-SpeakerBeam by @Emrys365 in #5155
Add pre-trained causal speech separation model and streaming demo by @LiChenda in #5172
KSC recipe by @khassanoff in #5171
[SVS] Add new recipe by @A-Quarter-Mile in #5173
Update AphasiaBank Recipe by @tjysdsg in #5104
fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in #5159
Add installer for ParallelWaveGAN by @ftshijt in #4052
[GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in #5123
[SVS] Update docs README.md by @South-Twilight in #5178
Update SVS README.md by @jerryuhoo in #5180
Adding eendss models by @soumimaiti in #5157
2022fall new task tutorial by @ftshijt in #5186
[SVS] Updates for recipes by @A-Quarter-Mile in #5187
[GAN SVS] fix phoneme predictor by @jerryuhoo in #5188
Update generate_librimix_sd.sh by @leepeiying in #5182
Bug fix for #5195 by @YosukeHiguchi in #5196
[SVS] Update on recipes by @A-Quarter-Mile in #5197
Update preprocessor.py by @sw005320 in #5200
Minor fixes for ML-SUPERB by @ftshijt in #5202
Quick fix for whisper specaug by @siddhu001 in #5206
espnet-spk data preparation part by @Jungjee in #5184
Fix M4singer multi-spk recipe by @ftshijt in #5201
Update Dataset link for mlsuperb by @ftshijt in #5216
Fix bug when score_type is set to normal in ml_superb by @ftshijt in #5217
Add new functions and fix some bugs in SE by @Emrys365 in #5193
Update import order by @ftshijt in #5229
Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in #5228
Standalone Transducer v1.1 by @b-flo in #5140
Small fixes for Transducer by @b-flo in #5247
add asr2 task and librispeech recipe as an example. by @simpleoier in #5181
fix norm compatibility in scale discriminator by @kan-bayashi in #5240
CFSD, SECS metrics for TTS by @imdanboy in #5235
Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in #5246
Fix bugs in mfa_format.py by @G-Thor in #5223
New features for SVS by @ftshijt in #5245
re-fix norm compatibility in scale discriminator by @kan-bayashi in #5249
add conv1d subsampling 3 and egs2/librispeech/asr2 wavlm_large_21 kmeans (1000/2000) results by @simpleoier in #5252
Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in #5212
Fix a bug in score script for ML-SUPERB by @ftshijt in #5254
Refactor prep_segments in SVS by @jerryuhoo in #5210
A minor fix for num_splits_ssl for training by @ftshijt in #5262
[SVS] add singing tacotron by @A-Quarter-Mile in #5233
Add script to use speaker averaged xvectors in TTS training by @G-Thor in #5244
Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in #5267
Some name update for ml-superb by @ftshijt in #5276
Add support for K2 pruned transducer loss by @b-flo in #5268
Fix Transducer doc by @b-flo in #5306
Update installation.md by @kamo-naoyuki in #5291
Update install_nkf.sh by @sw005320 in #5300
Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in #5310
Update README.md by @sw005320 in #5315
Update setup.py by @sw005320 in #5316
Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in #5251
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5294
A few updates for asr2 and hubert by @simpleoier in #5285
Add decode_options and hyp_cleaner in evaluate_whisper_inference by @pyf98 in #5272
update pyworld version by @kan-bayashi in #5319
fix a data preparation issue for librimix recipe. by @LiChenda in #5322
Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in #5289
fix the s3prl frontend gradient backprop bug, ensuring feature_grad_mult=1.0 by @simpleoier in #5297
ESPNet-SPK part 2 - training by @Jungjee in #5258
remove some tests in espnet1 integration test by @sw005320 in #5328
Fix random segments by @iamanigeeit in #5274
Skip CI for draft PR by @ftshijt in #5333
Update cancel.yml by @kan-bayashi in #5334
Update several SE recipes and bash scripts by @Emrys365 in #5327
Add PULL_REQUEST_TEMPLATE.md by @kan-bayashi in #5340
ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in #5314
Minimize espnet2 integration test by @kan-bayashi in #5324
PR Labels for CI control by @Fhrozen in #5320
Split ci into several jobs by @kan-bayashi in #5343
Update CONTRIBUTING.md by @sw005320 in #5335
Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in #5341
Fix documentation skip CI by @Fhrozen in #5351
Update the usage by @sw005320 in #5349
Docker Update by @Fhrozen in #5321
Update installation.md by @sw005320 in #5348
Fix doc condition by @kan-bayashi in #5355
Update issue templates by @sw005320 in #5357
Update Contribution.md by @Fhrozen in #5352
Fix .mergify condition by @kan-bayashi in #5354
Reduce ffmpeg installation time in ci by @kan-bayashi in #5356
Update CI table by @kan-bayashi in #5359
Clean workflow files by @kan-bayashi in #5360
Couple of tweaks for asr2.sh for the HF hub upload by @akreal in #5362
Update TEMPLATE_HF_Readme.md (fix bash typo) by @akreal in #5361
Add discrete-token ASR for LibriSpeech 100h by @akreal in #5350
Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in #5342
Fix bug in ngram training in slu.sh by @siddhu001 in #5364
Add musdb18 recipe for music source separation by @Emrys365 in #5338
Bugfix: JETS CTCLoss by @imdanboy in #5288
Check the value of n_shift == upsample_factor in GAN_TTS by @i...

@simpleoier

What's Changed

Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in #4888
Apply the latest black by @kamo-naoyuki in #4907
Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in #4906
How2 fix README, incorrect url by @roshansh-cmu in #4902
standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in #4905
Fix typo in lrs/README.md by @eltociear in #4911
MSUPERB setting update by @ftshijt in #4913
Update test_import.yaml to install numba by @kamo-naoyuki in #4918
update pyopenjtalk version to 0.3.0 by @kan-bayashi in #4912
CHiME-7 Task1 recipe by @popcornell in #4894
Update CHiME-7 Task 1 README.md by @popcornell in #4920
Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in #4922
Add few shot subset for mSuperb multilingual setting by @guapaQAQ in #4923
Fix existing bugs in the TSE task by @Emrys365 in #4915
IAM OCR recipe updates by @kenzheng99 in #4927
Fixing some issues with chime7-task1 baseline by @popcornell in #4925
set default none decoder for ASR by @ftshijt in #4917
Update inference and training setting for mSuperb multilingual model by @guapaQAQ in #4932
Add E-Branchformer Transducer results by @pyf98 in #4933
add tf-gridnet by @zqwang7 in #4864
Fixes + Channel Selection for CHiME-7 Task by @popcornell in #4934
fix extracted feature dummy generation by @roshansh-cmu in #4926
Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in #4941
CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in #4942
Update show_asr_result.sh by @kamo-naoyuki in #4943
CHiME-7 DASR correct development results by @popcornell in #4946
Fix 'floordiv is deprecated' warnings by @fujimotos in #4945
Added WSLII installation instruction by @sw005320 in #4949
Update Muskits by @A-Quarter-Mile in #4931
Set a longer time execution threshold for related failed time-outs CI by @ftshijt in #4962
Modify data prep for mSUPERB multilingual by @guapaQAQ in #4965
Add E-Branchformer results in some recipes by @pyf98 in #4958
Add 'six' as a required Python module by @fujimotos in #4964
add msuperb linguistic analysis by @hhhaaahhhaa in #4938
Fix a 'ref_channel'-related issue in espnet2/bin/enh_inference.py by @Emrys365 in #4972
Add E-Branchformer results in slurp_entity by @pyf98 in #4971
Add Conformer and E-Branchformer results in fisher_spanish_callhome ASR by @pyf98 in #4976
[SVS] Add Joint-training by @A-Quarter-Mile in #4977
Update the chunk iterator for the TSE task by @Emrys365 in #4929
update msuperb LID scoring script by @hhhaaahhhaa in #4979
add multilingual+lid lid score generation by @hhhaaahhhaa in #4982
Add python=3.10 to CI by @kamo-naoyuki in #4627
LID score v2 by @hhhaaahhhaa in #4983
Fix ci by @kamo-naoyuki in #4985
Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in #4986
Remove six by @kamo-naoyuki in #4988
Modify format_wav_scp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in #4997
Fix Whisper tokenizer CI error by @slSeanWU in #5004
fix s3prl upstream attribute bug by @jwrh in #5003
[Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in #4994
Fix typeguard version by @silvanocerza in #5009
Add .pre-commit-config.yaml by @kamo-naoyuki in #5011
Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in #4998
Modfiy .pre-commit-config.yaml by @kamo-naoyuki in #5012
Modify .pre-commit-config.yaml by @kamo-naoyuki in #5014
Modify .pre-commit-config.yaml by @kamo-naoyuki in #5015
[Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in #5019
Modify asr.sh by @kamo-naoyuki in #5020
[SVS] Improve visinger by @jerryuhoo in #5022
Use scripts/utils/print_args.sh instead of pyscripts/utils/print_args.py by @kamo-naoyuki in #5025
Add docstring in extra_path.sh by @kamo-naoyuki in #5028
Update installation.md by @kamo-naoyuki in #5029
Update README.md by @kamo-naoyuki in #5030
Update README.md by @kamo-naoyuki in #5031
Change bc to python by @kamo-naoyuki in #5032
Update tools/Makefile and path.sh by @kamo-naoyuki in #5027
Fix for format_wav_scp.py by @kamo-naoyuki in #5038
Add execute permission to install_ice_g2p.sh by @kamo-naoyuki in #5040
Bug fix of #5025 by @kamo-naoyuki in #5039
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5041
Update README.md by @kamo-naoyuki in #5042
Update README.md by @kamo-naoyuki in #5043
Update README.md by @kamo-naoyuki in #5045
Fix in gen_task1_data.sh from CHiME7 by @boeddeker in #4953
Update README.md by @eml914 in #5044
Add installers/install_ffmpeg.sh by @kamo-naoyuki in #5046
Fix broken links reported by #5048 by @ShigekiKarita in #5050
fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in #4978
Add miniconda in gitignore by @pyf98 in #5052
CHiME-7 DASR fixes from participants feedback by @popcornell in #4999
Fix the condition for maxlen warning in beam search by @pyf98 in #5055
Fixed SQLalchemy version for MFA by @Fhrozen in #5059
Support Multi-Blank Transducer in Espnet2 by @jctian98 in #4876
Fix chime7 DASR task1 run.sh by @kamo-naoyuki in #5060
CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in #5061
Add test_format_wav_scp_sh.bats by @kamo-naoyuki in #5062
Update documentation by @kamo-naoyuki in #5063
Support SOT training on LibriMix data. by @pengchengguo in #4861
Update check_install.py by @kamo-naoyuki in #5066
Tedlium3 recipe by @Some-random in #5068
Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in #5074
Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in #5075
Clean ML-SUPERB by @ftshijt in #5067
CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in #5054
Chime7-task1 diarization (updated results) by @popcornell in #5088
Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in #5084
[SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pu...

@ftshijt

What's Changed

Initialize VISinger branch by @ftshijt in #4683
Update VISInger branch by @ftshijt in #4705
Update UASR branch with latest ESPnet functions by @ftshijt in #4752
Update uasr by @ftshijt in #4770
Shell scripts for UASR processing by @ftshijt in #4769
Uasr python scripts by @DongjiGao in #4791
Update visinger by @ftshijt in #4818
Update test_custom_transducer.py by @sw005320 in #4826
Update asr.sh by @sw005320 in #4827
Fixed pad mode for librosa.stft by @Masao-Someki in #4832
Add E-Branchformer models in some recipes by @pyf98 in #4833
Fix data prep in GigaSpeech by @pyf98 in #4836
time sync decoding for asr by @brianyan918 in #4792
Remove duplicated VOXFORGE in db.sh (line81 and line157) by @pyf98 in #4840
Fix argument parsing for non_linguistic_symbols in asr.sh by @pyf98 in #4841
Add a warning statement when the hypo length equals to the max out length. by @pengchengguo in #4843
Add target speaker extraction (TSE) functions by @Emrys365 in #4823
Multilingual superb by @ftshijt in #4824
VISinger by @jerryuhoo in #4689
Update VISInger to latest by @ftshijt in #4849
VISinger for singing voice synthesis by @ftshijt in #4848
Reduce word counts for ESPnet-SE++ Joss paper by @neillu23 in #4844
Add E-Branchformer configs and models in ASR recipes by @pyf98 in #4837
Address Muskits updates on README by @ftshijt in #4850
Minor fix for MSUPERB recipe by @ftshijt in #4851
Update for the latest changes in the draft (minor changes) by @neillu23 in #4852
Add E-Branchformer results on Librispeech by @kkim-asapp in #4856
Update hubert implementation. by @simpleoier in #4747
VISinger unit test by @jerryuhoo in #4855
Minor fix to commonvoice espnet1 by @ftshijt in #4862
[WIP] Add S4 decoder in ESPnet2 by @m-koichi in #4845
Update hubert feature and acknowledge information in related Readmes. by @simpleoier in #4863
Generating MFA aligments by @Fhrozen in #4803
[WIP] EURO uasr scripts by @DongjiGao in #4846
Update README.md related to ASR architecture by @m-koichi in #4865
Minor fix to librimix diar recipe by @ftshijt in #4867
Add Full Whisper Model for Finetuning by @slSeanWU in #4793
Add torchaudio version check for HuBERT pretraining by @simpleoier in #4872
add k2 decoder related scripts for EURO by @DongjiGao in #4868
EURO: small fix (temporarily remove support for nbest_rescoring) by @DongjiGao in #4875
Add description for Whisper ASR in homepage readme by @slSeanWU in #4877
Update README.md by @eltociear in #4879
add explanations to text tokenizing related scripts and remove unused script by @DongjiGao in #4880
update information about source and our modification for k2 related scripts by @DongjiGao in #4881
AphasiaBank ASR recipe by @tjysdsg in #4860
Multilingual SUPERB update by @ftshijt in #4878
ESPnet Unsupervised ASR (EURO project) by @ftshijt in #4774
Support ProDiff in TTS by @Fhrozen in #4808
Add E-Branchformer for GigaSpeech by @pyf98 in #4882
FLEURS - Auxillary CTC conditioning tasks by @wanchichen in #4756
Add python 3.8 requirement for Whisper & update tests by @slSeanWU in #4891
Update some ASR results in the main readme file by @pyf98 in #4883
Add Conv2dSubsampling1 module and test it in AphasiaBank ASR recipe by @tjysdsg in #4892
Support x-vector extractor based on RawNet by @Takaaki-Saeki in #4884
single language track setups by @DanBerrebbi in #4895
fixing bug deu1 by @DanBerrebbi in #4900
Fix dataprep issues based on updated data release via Google form by @roshansh-cmu in #4899
Add a new EGS2 recipe 'reazonspeech' by @fujimotos in #4885
Update version to 202301 by @kan-bayashi in #4901

New Contributors

@DongjiGao made their first contribution in #4791
@jerryuhoo made their first contribution in #4689
@m-koichi made their first contribution in #4845
@fujimotos made their first contribution in #4885

Full Changelog: v.202211...v.202301

@ftshijt

What's Changed

Update muskits update by @ftshijt in #4616
Muskit installation by @A-Quarter-Mile in #4617
Sync Muskits branch with Master by @ftshijt in #4640
Updates on Muskit Migration by @A-Quarter-Mile in #4631
Update Muskits branch by @ftshijt in #4662
Add stage 5 & stage 6 by @A-Quarter-Mile in #4649
Muskit: rename & reorganize features by @A-Quarter-Mile in #4668
Update Muskits branch by @ftshijt in #4671
Muskits CI fixing by @ftshijt in #4672
Muskits CI fix by @ftshijt in #4673
Muskits - apply isort by @ftshijt in #4677
Muskits CI fix by @ftshijt in #4678
Muskit: Add tokenizer by @A-Quarter-Mile in #4676
Muskits - various fix for CI test by @ftshijt in #4679
Muskit: add recipe ofuton by @A-Quarter-Mile in #4681
Muskits (CI fix) by @ftshijt in #4682
Fix CI issue in muskits by @ftshijt in #4687
Add dns_icassp22 Speech Enhancement Recipe by @slSeanWU in #4657
Singing Voice Synthesis Task for ESPnet by @ftshijt in #4670
Documentation of Tutorial and Muskits by @ftshijt in #4692
Add tests on MacOS and Windows (only installation) by @kamo-naoyuki in #4669
Add missing entries in readme by @ftshijt in #4699
Support ST without texts in source language by @sophia1488 in #4688
Update ConvInput for Transducer by @b-flo in #4720
Small changes for standalone Transducer by @b-flo in #4722
Fix input block tutorial documentation for Transducer by @b-flo in #4724
Fix HF Pytest Errors by @siddhu001 in #4737
Update to puebla-nahuatl recipe (some minor fixes) by @ftshijt in #4713
Add espnet2 TTS recipe on M-AILABS by @Takaaki-Saeki in #4701
Update outdated enh config files by @Emrys365 in #4719
add src_sos & src_eos for mt task to address the index out of range w… by @simpleoier in #4736
Add g2pk_explicit_space tokenizer by @jonghwanhyeon in #4718
Fix JETS inference with GST (#4743) by @kan-bayashi in #4744
Update on Muskit by @A-Quarter-Mile in #4700
add fleurs conformer+sc-ctc results by @wanchichen in #4746
Add recipe for OCR task on IAM handwriting dataset by @kenzheng99 in #4707
Add Talromur2 recipe by @G-Thor in #4680
Add multi-channel enh_asr for CHiME-4 by @YoshikiMas in #4706
chunk_mask error by @aky15 in #4751
fix wav2vec2 encoder mask bug by @simpleoier in #4772
Add Hugging Face Transformers Decoder, Tokenizer and their example on SLURP by @akreal in #4099
[Recipe PR] MELD: Multimodal EmotionLines Dataset by @realzza in #4771
MultiIRIS follow up by @YoshikiMas in #4765
Add CATSLU results for XLS-R with mBART-50 by @akreal in #4782
Add MEDIA and PortMEDIA results for XLS-R with mBART-50 by @akreal in #4794
Add SLUE-VoxPopuli results for WavLM with mBART-50 by @akreal in #4777
Follow up for SLURP and CATSLU by @akreal in #4796
Update README in chime4/enh_asr1 by @YoshikiMas in #4795
fix parsing token_list by @imdanboy in #4778
Use torchaudio functions for beamforming related operations in torch 1.12.1+ by @Emrys365 in #4638
PIT E2E multi-speaker ASR and librimix recipe by @simpleoier in #4753
Fix an audio format issue in some enh recipes by @YoshikiMas in #4799
Fixing How2-2000h Data preparation and Seq Length Assert for Longformer Encoder by @roshansh-cmu in #4805
Adding MFA scripts for LJSpeech by @iamanigeeit in #4801
fix typo in espnet2_tutorial.md by @eltociear in #4811
[WIP] E-Branchformer Encoder in ESPnet2 by @kkim-asapp in #4812
Muskit update by @A-Quarter-Mile in #4783

New Contributors

@A-Quarter-Mile made their first contribution in #4617
@sophia1488 made their first contribution in #4688
@kenzheng99 made their first contribution in #4707
@realzza made their first contribution in #4771
@iamanigeeit made their first contribution in #4801
@eltociear made their first contribution in #4811
@kkim-asapp made their first contribution in #4812

Full Changelog: v.202209...v.202211

@LiChenda

What's Changed

Add dynamic mixing in the speech separation task. by @LiChenda in #4387
Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in #4560
Offline/Online (standalone) ESPnet2 Transducer by @b-flo in #4479
Unfix matplotlib version by @kamo-naoyuki in #4576
use torch.finfo for dtype other than float by @wenzhe-nrv in #4584
Update recipe for slurp-entity by @ftshijt in #4585
Egs2 aesrc by @brianyan918 in #4592
update checks for bias in initialization by @LiChenda in #4574
[WIP] Update to fit the recent update in s3prl. by @simpleoier in #4593
Unfix numpy version by @kamo-naoyuki in #4598
Update to fit the recent update in s3prl. by @simpleoier in #4600
Add improved results on FLEURS dataset by @wanchichen in #4596
Update mp4_to_wav.sh by @jaehyun-ko in #4605
Pass output_dir as str to wandb.init() by @jonghwanhyeon in #4607
Support enh_s2t joint training on multi-speaker data by @Emrys365 in #4566
Add ASR results for commonvoice zh_TW by @slSeanWU in #4612
Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in #4609
recipe config update by @ftshijt in #4621
Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in #4604
New SLU task by @siddhu001 in #4569
Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in #4620
Update conformer result of AMI corpus by @teinhonglo in #4629
Offline/Online Branchformer Transducer by @b-flo in #4582
Change to install numba using pip instead of conda by @kamo-naoyuki in #4637
Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in #4619
Add 2-pass SLU code for FSC Challenge by @siddhu001 in #4636
CI fix and some other minor recipe fixes by @ftshijt in #4656
Update the title of plots to be y-label vs x-label by @pyf98 in #4647
Update VIVOS download link by @hieuthi in #4644
Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in #4635
Amend to CI fix by @ftshijt in #4663
qasr update by @massabaali7 in #4642
Open_li110 for large-scale multilingual speech by @ftshijt in #4408
Fix the path of calculate_rft.py by @sw005320 in #4660
Fix importlib-metadata version by @kan-bayashi in #4686
Cmu arctic tts pretrain finetune by @soumimaiti in #4456
updated version to 202209 by @kan-bayashi in #4685

New Contributors

@wenzhe-nrv made their first contribution in #4584
@jaehyun-ko made their first contribution in #4605
@jonghwanhyeon made their first contribution in #4607
@slSeanWU made their first contribution in #4612
@massabaali7 made their first contribution in #4642
@soumimaiti made their first contribution in #4456

Full Changelog: v.202207...v.202209

@lazykyama

New Features

[New Features][ESPnet1][ASR] Add DDP support for v1 ASR training. #4430 by @lazykyama
[New Features][ESPnet2] Support tensorboard graph #4418 by @kamo-naoyuki
[New Features][ESPnet2][ASR] Branchformer Encoder in ESPnet2 #4400 by @pyf98
[New Features][ESPnet2][Diarization][SE] enh_diar joint model #4339 by @YushiUeda
[New Features][ESPnet2][ESPnet1] Calculate RTF and latency in espnet2 #4382 by @espnetUser
[New Features][ESPnet2][ESPnet1][SE] Add EnhPreprocessor for Speech Enhancement #4321 by @Emrys365
[New Features][ESPnet2][SE] Add DPTNet and WarmupStepLR scheduler #4449 by @Emrys365
[New Features][ESPnet2][SE] Add support for calculating losses on noise and dereverberated signals #4476 by @Emrys365

Recipe

[Recipe][ESPnet2] Aishell-2 GPU info #4501 by @jctian98
[Recipe][ESPnet2] Fix librispeech default path to signify auto download #4517 by @karthik19967829
[Recipe][ESPnet2] Recipe fix for PueblaNahuatl Recipe #4522 by @ftshijt
[Recipe][ESPnet2][ASR][README] Add Aishell-2 ASR Recipe for Espnet2 #4451 by @jctian98
[Recipe][ESPnet2][ASR][README] Add AmericasNLP 2022 baselines #4428 by @akreal
[Recipe][ESPnet2][ESPnet1][ASR][Installation] FLEURS ASR Recipe for ESPnet2 #4455 by @wanchichen
[Recipe][ESPnet2][ESPnet1][ASR][README] tedx_spanish_corpus egs2 recipe #4523 by @jessicah25
[Recipe][ESPnet2][ESPnet1][ASR][SE] Adding L3DAS22 Task1 model to ESPNet-SE #3994 by @popcornell
[Recipe][ESPnet2][ESPnet1][ST] Must_C v1 and v2 in egs2 #4306 by @brianyan918
[Recipe][ESPnet2][README] Dcase task1 Baseline #4317 by @siddhu001
[Recipe][ESPnet2][README] Report Aishell-2 Transducer results #4489 by @jctian98
[Recipe][ESPnet2][README] Update language codes in AmericasNLP 2022 baseline #4441 by @akreal
[Recipe][ESPnet2][README] Vox populi baseline #4478 by @siddhu001
[Recipe][ESPnet2][SE] L3DAS22 enhancement recipe #4269 by @neillu23
[Recipe][ESPnet2][SE] Update notes in the recipes for DNS challenges #4433 by @YoshikiMas
[Recipe][ESPnet2][SE][SLU][ST] LT-Spatialized and SLURP-Spatialized combined enhancement recipe #4268 by @neillu23
[Recipe][ESPnet2][ST] Add moses check for ST recipes #4417 by @ftshijt
[Recipe][ESPnet2][TTS] Add talromur recipe #4379 by @G-Thor
[Recipe][ESPnet2][TTS] Fix for issue #4401 #4402 by @G-Thor
[Recipe][ESPnet2][TTS] add pre-trained model jets in the recipe of ljspeech, kss #4406 by @imdanboy

Bugfix

[Bugfix][ESPnet1] fix the corrupted pretrained model #4490 by @wentaoxandry
[Bugfix][ESPnet1][ESPnet2] Fix an4 URL #4427 by @pyf98
[Bugfix][ESPnet1][ESPnet2][RNNT] Fix mAES with big vocab size #4312 by @b-flo
[Bugfix][ESPnet2] Adding init.py to espnet2/diar/layers and espnet2/diar/separator #4470 by @cycentum
[Bugfix][ESPnet2] Fix tensorboard-graph creation for multi gpu mode #4431 by @kamo-naoyuki
[Bugfix][ESPnet2] Update char_tokenizer.py #4499 by @xiabingquan
[Bugfix][ESPnet2][ESPnet1][ASR][LM][MT][TTS] Fix Transducer LM fusion and add Logging for Transducer inference #4327 by @chintu619
[Bugfix][ESPnet2][SE] Fix a bug in enh unit test #4435 by @Emrys365

Enhancement

[Enhancement][ESPnet2] Optionize graph creation #4551 by @kan-bayashi
[Enhancement][ESPnet2][Installation][TTS] Add icelandic g2p #4384 by @G-Thor
[Enhancement][ESPnet2][SE] Add support of test-only criterions after each epoch #4381 by @Emrys365
[Enhancement][ESPnet2][SSL] raise more useful error in espnet2/asr/frontend/s3prl.py if s3prl is not installed #4480 by @popcornell
[Enhancement][ESPnet2][TTS] Add JETS AlignmentModule in calculate_all_attentions.py #4446 by @seastar105

Refactoring

[Refactoring][ESPnet1] Refactoring 'is_prefix' function #4530 by @jhlee9010
[Refactoring][ESPnet2][ASR] Zero_infinity option for ctc loss #4415 by @kamo-naoyuki

Others

[CI][ESPnet1][ESPnet2][Installation] Remove the version restriction for numpy #4419 by @kamo-naoyuki
[CI][ESPnet2] Canged to install espnet from wheel in the test_import CI test #4471 by @kamo-naoyuki
[CI][Installation] Temporary fixed numpy version #4464 by @kamo-naoyuki
[Documentation] Add notes on batch size and num of GPUs in ESPnet2 documentation #4436 by @pyf98
[Documentation][ESPnet1] Update decoder.py #4322 by @sw005320
[Documentation][ESPnet2] Add a note to follow the installation instructions #4477 by @akreal

Acknowledgements

Special thanks to @Emrys365, @G-Thor, @YoshikiMas, @YushiUeda, @akreal, @b-flo, @brianyan918, @chintu619, @cycentum, @espnetUser, @ftshijt, @imdanboy, @jctian98, @jessicah25, @jhlee9010, @kamo-naoyuki, @kan-bayashi, @karthik19967829, @lazykyama, @neillu23, @popcornell, @pyf98, @seastar105, @siddhu001, @sw005320, @wanchichen, @wentaoxandry, @xiabingquan.

Releases: espnet/espnet

ESPnet version 202412

New Features

Enhancement

Recipe

Bugfix

Documentation

Others

Acknowledgements

Contributors

ESPnet version 202409

New Features

Enhancement

Recipe

Bugfix

Documentation

Refactoring

Others

Acknowledgements

Contributors

ESPnet version 202402

News

New Features

Enhancement

Recipe

Bugfix

Documentation

Others

Acknowledgements

Contributors

ESPnet version 202310

What's Changed

New Contributors

Contributors

ESPnet version 202308

What's Changed

Contributors

ESPnet version 202304

What's Changed

Contributors

ESPnet version 202301

What's Changed

New Contributors

Contributors

ESPnet version 202211

What's Changed

New Contributors

Contributors

ESPnet version 202209

What's Changed

New Contributors

Contributors

ESPnet version 202207

New Features

Recipe

Bugfix

Enhancement

Refactoring

Others

Acknowledgements

Contributors