SpeechLaughRecogniser

An ASR model for transcribing laughter in Speech Laugh audio

Dataset

Switchboard data

path=/deepstore/datasets/hmi/speechlaugh-corpus # global data path

Using gdown to download the .zip file data and unzip it.

gdown 1VlQlyY3v3wtT2S047lwlTirWisz5mQ18 -O /path/to/data/switchboard.zip

#path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data # global datasets path

cd path/to/data #/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data

unzip switchboard.zip

# after unzip, the data will contain the following folders:
# - audio_wav
# - transcripts

Generate audio_segments folder, this could be stored in the following path

path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/audio_segments

VocalSound

Download the dataset from VocalSound and save it to path/to/data/vocalsound_data folder

wget -O vocalsound_16k.zip https://www.dropbox.com/s/c5ace70qh1vbyzb/vs_release_16k.zip?dl=1

#path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data

unzip vocalsound_16k.zip

The path to the data would be:

path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data/audio_16k

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

Download these datasets from HuggingFace datasets and saving to data/huggingface_data folder

First set the path to HuggingFace cache to this folder

$ export HF_DATASETS_CACHE="../data/huggingface_data"

# or change to the global datasets
$ export HF_DATASETS_CACHE="/deepstore/datasets/hmi/speechlaugh-corpus/huggingface_data"

Then download the datasets, given the dataset name in HuggingFace as follow:

ami: "edinburghcstr/ami" "ihm" split="train"

fsd50k_noisy: "sps44/fsdnoisy18k"
audioset: "benjamin-paine/audio-set-16khz"

Preprocessing

Datasets Uses for Training

After preprocessing, we have seperated, cleaned, retokenized and stored the datasets in 3 seperate datasets, corresponding to 3 types of token using for training and evaluation, they are:

switchboard_speech

path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_speech

switchboard_laugh

path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_laugh

"Laughter dataset with `intext = True`": Dataset({
    features: ['audio', 'sampling_rate', 'transcript'],
    num_rows: 6900
})

switchboard_speechlaugh

path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_speechlaugh

"Speech-laugh dataset": Dataset({
    features: ['audio', 'sampling_rate', 'transcript'],
    num_rows: 7672
})

Training

To check disusage of models directory, datasets in global storage, navigate to the storage and use du command.

cd /path/to/storage
du -sh * | sort -hr

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
alignment_transcripts		alignment_transcripts
datasets		datasets
evaluate		evaluate
examples/alignments		examples/alignments
huggingface_utils		huggingface_utils
logs/metrics		logs/metrics
modules		modules
preprocess		preprocess
ref_models		ref_models
slurm_output		slurm_output
utils		utils
vocalwhisper/speechlaughwhisper-subset-10/fine-tuned-2000steps-oom		vocalwhisper/speechlaughwhisper-subset-10/fine-tuned-2000steps-oom
.gitignore		.gitignore
README.md		README.md
SpeechLaughRecognition.py		SpeechLaughRecognition.py
data_preprocess.py		data_preprocess.py
preprocess.sbatch		preprocess.sbatch
requirements.txt		requirements.txt
speech_laugh_training.sbatch		speech_laugh_training.sbatch
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechLaughRecogniser

Dataset

Switchboard data

VocalSound

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

Preprocessing

Datasets Uses for Training

Training

About

Releases

Packages

Languages

hhoangphuoc/SpeechLaughRecogniser

Folders and files

Latest commit

History

Repository files navigation

SpeechLaughRecogniser

Dataset

Switchboard data

VocalSound

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

Preprocessing

Datasets Uses for Training

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages