SpeechLaughRecogniser

An ASR model for transcribing laughter in Speech Laugh audio

Dataset

Switchboard data

path=/deepstore/datasets/hmi/speechlaugh-corpus # global data path

Using gdown to download the .zip file data and unzip it.

gdown 1VlQlyY3v3wtT2S047lwlTirWisz5mQ18 -O /path/to/data/switchboard.zip

#path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data # global datasets path

cd path/to/data #/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data

unzip switchboard.zip

# after unzip, the data will contain the following folders:
# - audio_wav
# - transcripts

Generate audio_segments folder, this could be stored in the following path

path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/audio_segments

VocalSound

Download the dataset from VocalSound and save it to path/to/data/vocalsound_data folder

wget -O vocalsound_16k.zip https://www.dropbox.com/s/c5ace70qh1vbyzb/vs_release_16k.zip?dl=1

#path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data

unzip vocalsound_16k.zip

The path to the data would be:

path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data/audio_16k

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

Download these datasets from HuggingFace datasets and saving to data/huggingface_data folder

First set the path to HuggingFace cache to this folder

$ export HF_DATASETS_CACHE="../data/huggingface_data"

# or change to the global datasets

$ export HF_DATASETS_CACHE="/deepstore/datasets/hmi/speechlaugh-corpus/huggingface_data"

Then download the datasets, given the dataset name in HuggingFace as follow:

ami: "edinburghcstr/ami" "ihm" split="train"

fsd50k_noisy: "sps44/fsdnoisy18k"
audioset: "benjamin-paine/audio-set-16khz"

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
alignment_transcripts		alignment_transcripts
checkpoints		checkpoints
datasets		datasets
evaluate		evaluate
examples		examples
huggingface_utils		huggingface_utils
modules		modules
preprocess		preprocess
ref_models		ref_models
slurm_output		slurm_output
utils		utils
vocalwhisper/speechlaughwhisper-subset-10		vocalwhisper/speechlaughwhisper-subset-10
.gitignore		.gitignore
README.md		README.md
SpeechLaughRecognition.py		SpeechLaughRecognition.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
speech_laugh_training.sbatch		speech_laugh_training.sbatch
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechLaughRecogniser

Dataset

Switchboard data

VocalSound

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

About

Releases

Packages

Languages

hhoangphuoc/SpeechLaughRecogniser

Folders and files

Latest commit

History

Repository files navigation

SpeechLaughRecogniser

Dataset

Switchboard data

VocalSound

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages