Skip to content

An ASR model for transcribing laughter and speech-laugh in conversational speech

Notifications You must be signed in to change notification settings

hhoangphuoc/SpeechLaughRecogniser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechLaughRecogniser

An ASR model for transcribing laughter in Speech Laugh audio

Dataset

Switchboard data

path=/deepstore/datasets/hmi/speechlaugh-corpus # global data path
  • Using gdown to download the .zip file data and unzip it.
gdown 1VlQlyY3v3wtT2S047lwlTirWisz5mQ18 -O /path/to/data/switchboard.zip

#path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data # global datasets path

cd path/to/data #/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data

unzip switchboard.zip

# after unzip, the data will contain the following folders:
# - audio_wav
# - transcripts
  • Generate audio_segments folder, this could be stored in the following path
path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/audio_segments

VocalSound

  • Download the dataset from VocalSound and save it to path/to/data/vocalsound_data folder
wget -O vocalsound_16k.zip https://www.dropbox.com/s/c5ace70qh1vbyzb/vs_release_16k.zip?dl=1

#path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data

unzip vocalsound_16k.zip
  • The path to the data would be:
path=/deepstore/datasets/hmi/speechlaugh-corpus/vocalsound_data/audio_16k

Other datasets (Ami, VocalSound, FSD50K-noisy, etc.)

  • Download these datasets from HuggingFace datasets and saving to data/huggingface_data folder
  1. First set the path to HuggingFace cache to this folder
$ export HF_DATASETS_CACHE="../data/huggingface_data"

# or change to the global datasets
$ export HF_DATASETS_CACHE="/deepstore/datasets/hmi/speechlaugh-corpus/huggingface_data"
  1. Then download the datasets, given the dataset name in HuggingFace as follow:
  • ami: "edinburghcstr/ami" "ihm" split="train"
  • fsd50k_noisy: "sps44/fsdnoisy18k"
  • audioset: "benjamin-paine/audio-set-16khz"

Preprocessing

Datasets Uses for Training

After preprocessing, we have seperated, cleaned, retokenized and stored the datasets in 3 seperate datasets, corresponding to 3 types of token using for training and evaluation, they are:

  • switchboard_speech
path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_speech
  • switchboard_laugh
path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_laugh

"Laughter dataset with `intext = True`": Dataset({
    features: ['audio', 'sampling_rate', 'transcript'],
    num_rows: 6900
})
  • switchboard_speechlaugh
path=/deepstore/datasets/hmi/speechlaugh-corpus/switchboard_data/swb_speechlaugh

"Speech-laugh dataset": Dataset({
    features: ['audio', 'sampling_rate', 'transcript'],
    num_rows: 7672
})

Training

To check disusage of models directory, datasets in global storage, navigate to the storage and use du command.

cd /path/to/storage
du -sh * | sort -hr

About

An ASR model for transcribing laughter and speech-laugh in conversational speech

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published