Myanmar Speech Classification Using Transfer Learning for Image Classification

Khin, Ou Ou; Thu, Ye Kyaw; Sakata, Tadashi; SAGISAKA, Yoshinori; Ueda, Yuichi

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Seventeenth International Conference On Computer Applications (ICCA 2019)
/
View Item

dc.contributor.author	Khin, Ou Ou
dc.contributor.author	Thu, Ye Kyaw
dc.contributor.author	Sakata, Tadashi
dc.contributor.author	SAGISAKA, Yoshinori
dc.contributor.author	Ueda, Yuichi
dc.date.accessioned	2019-07-22T08:44:38Z
dc.date.available	2019-07-22T08:44:38Z
dc.date.issued	2019-02-27
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/1192
dc.description	The authors gratefully acknowledge the teachers and students from the University of Computer Studies, Banmaw, who participated in recording the sounds for the Myanmar consonants and vowels, and other speakers from Kumamoto University, who aided in recording the words	en_US
dc.description.abstract	In this paper, our research on speech classification using an image classification approach is discussed for the Myanmar language. We tested the method for Myanmar consonants, vowels, and words, on our recorded database of 22-consonant, 12-vowel, and 54-word sound classes, containing spectrograms of Myanmar speech. Because Myanmar language is tonal, the sounds are very similar for precise classification based on audio features, while the visual representations differ. Therefore, it is important to consider the visual representations of audio in classifying the Myanmar language. In this study, we treated Myanmar speeches with a convolutional neural network model (Inception-v3) to fit spectrogram images, performing transfer learning from pre-trained weights on ImageNet. Validation accuracies of 60.70%, 73.20%, and 94.60% were achieved for the consonant, vowel, and word-level classifications, respectively. In order to determine the retrained model performance, both closed and open testing were conducted. Although our experiment was distinct from other traditional audio classification methods, promising results were obtained for the first exploration of Myanmar speech classification using transfer learning for image classification. In fact, these experimental results were attained using Google’s Inception-v3 model, constructed with different image domains. Therefore, the research and results demonstrate that it is possible to perform Myanmar speech classification.	en_US
dc.language.iso	en	en_US
dc.publisher	Seventeenth International Conference on Computer Applications(ICCA 2019)	en_US
dc.title	Myanmar Speech Classification Using Transfer Learning for Image Classification	en_US
dc.type	Article	en_US