The Mel-frequency cepstrum coefficient for music emotion recognition in machine learning
thesisposted on 28.03.2022, 17:29 authored by Ai-Phee Chris Yong
The Mel-frequency Cepstrum Coefficient (MFCC), a technique designed initially for speech analysis, has in recent years become very popular in music emotion recognition projects. MFCC uses the Mel scaling method to simulate human auditory properties, logarithmic noise reduction techniques, and the Discrete Cosine Transformation (DCT) to generalise all salient features, without losing critical information. These techniques, while applicable to speech analysis, may not always be suitable for music analysis. We suggest, in Music Emotion Recognition (MER) analysis, spectral and temporal (which have a deep historical foundation) should be the more relevant features to use. We propose extracting three feature types, MFCC, Spectral, and Temporal, from the clips of songs in the '1000 songs' dataset to train a simple Artificial neural network (ANN). The trained ANN model will subsequently be able to predict the emotion value of songs. The prediction error is calculated based on the predicted value and actual annotated value. The feature that produces the lowest prediction error is judged as the most suitable feature for MER. Our results show that spectral features produced the lowest error, whereas MFCC produced the highest prediction error; this suggests that MFCC may not be a suitable feature for MER.