6
$\begingroup$

While reading this (see feature extraction section, subsection 4.1), I came across the following:

The per-frame values for each coefficient are summarized across time using the following summary statistics: minimum, maximum, median, mean, variance, skewness, kurtosis and the mean and variance of the first and second derivatives, resulting in a feature vector of dimension 225 per slice.

From what I understood he is trying to summarize a ~4 second audio recording by extracting a feature well known for its effectiveness (MFCC). Since the extracted MFCC forms a matrix (and he needs a feature vector) he summarizes the matrix several times in different ways (min, max, median, mean, variance, skewness, kurtosis, and the derivatives). By concatenating those summaries he obtains his final feature vector whose length is 225.

Questions:

  1. Why and when should one use derivatives of features? Is this a common thing to do?

  2. How to compute the derivatives of an MFCC matrix?

I'm especially interested in the second question.

$\endgroup$

1 Answer 1

3
$\begingroup$

The first derivative is often referred to as delta MFCC, and second derivative as delta-delta MFCC. The same concepts can also be applied to melspectrograms.

The derivatives of the MFCC models changes, how much variation there is between frames (per filter band). A constant sound would have a high summarized mean MFCC, but a low summarize mean delta-MFCC. This has been shown to improve results on speech classification tasks for instance.

The delta MFCC is computed per frame. For each frame, it is the current MFCC values minus the previous MFCC frame values. In practice it is common to also apply a smoothing filter, as the difference operation is naturally sensitive to noise.

For example in Python, one can use librosa to compute the MFCC and its deltas.

y, sr = librosa.load(librosa.util.example_audio_file())
mfcc = librosa.feature.mfcc(y=y, sr=sr)
mfcc_delta = librosa.feature.delta(mfcc, order=1)
mfcc_delta2 = librosa.feature.delta(mfcc, order=2)
$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.