Speech Emotion Classification Using SVM Integrated Efficient DenseNet

Mar, Lwin Lwin

Speech Emotion Classification Using SVM Integrated Efficient DenseNet

Mar, Lwin Lwin

URI: https://onlineresource.ucsy.edu.mm/handle/123456789/2809

Date: 2024-06

Abstract:

Machines are created to have the ability of human nowadays. Machines can make learning, perceiving with intelligence of humans. Emotions can be known by also expression of faces and gestures. Speech emotion recognition is extracting of emotions from human speeches. However, even in humans, to recognize emotions in their speeches is difficult without considering what they mean. In machines, recognizing emotions is very difficult task. SER can be applied in several fields. Burmese speech emotion recognition research is very few and speech emotion dataset for Burmese language is low resource. Moreover, effective fusion of feature extractions can get superior results than only one feature extraction. For the proposed Burmese speech emotion classification, BMISEC dataset is used. BMISEC is Burmese Movies Interviews Speech Emotion Corpus. It is a best prepared speech emotion dataset as possible as. Deep learning architecture DenseNet has many advantages. The main advantage of this model is gradients of the model and improving information flow. It is easier for training than other models., which makes them easy to train. DenseNet is used to classify fusion of the audio features and image features in the proposed system. In DenseNet-Emotion used in the system, SelectKBest feature selection is used for selecting the best features. moreover, SVM is used in classifier layer of the model. The novel feature extraction for Burmese speech emotion classification is proposed. This feature extraction is called Text-tone feature extraction and in the feature extraction, emotion Myanmar sentences are segmented into words. The words are converted into speech and from these emotion words, pitch features, loudness features and duration features are extracted.ဗုဒၶံသရဏံဂစၦာမိ၊ စမၼံသရဏံဂစၦာမိ၊ သံဃံၼံသရဏံဂစၦာမိ Burmese speech, there are low, high, creaky and checked tone. Pitch, loudness and duration can distinguish the four tones very well. For emotion spectrograms, Local Binary Pattern is used in the system. It is a good method for diverse objects. For speech emotion spectrograms with various intensities, LBP is very well. Including Burmese emotion spectrograms, many other languages emotion spectrograms, it can extract emotion information very well. The two feature extraction methods are supported by two popular speech feature extraction methods. They are Mel-frequency Cepstral Coefficients (MFCC) and Discrete Wavelet Transform (DWT). They give excellent result if there is no noise in the emotion speech. BMISEC is the best built foundation for the proposed system. Therefore, it can support very well for the system. Fusion of each feature extraction’s advantage can give the more excellent result than single feature extraction. Novel new feature extraction is supported with other three feature extractions to get the excellent result. There are seven emotion types to be classify. These types are happy, angry, disgust, fear, surprise, sad and neutral. The proposed system gets the accuracy of emotion classification of 88.388% for only 50 epochs.

Show full item record