UCSY's Research Repository

Myanmar Speech Classification Using Transfer Learning for Image Classification

Show simple item record

dc.contributor.author Khin, Ou Ou
dc.contributor.author Thu, Ye Kyaw
dc.contributor.author Sakata, Tadashi
dc.contributor.author SAGISAKA, Yoshinori
dc.contributor.author Ueda, Yuichi
dc.date.accessioned 2019-07-22T08:44:38Z
dc.date.available 2019-07-22T08:44:38Z
dc.date.issued 2019-02-27
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/1192
dc.description The authors gratefully acknowledge the teachers and students from the University of Computer Studies, Banmaw, who participated in recording the sounds for the Myanmar consonants and vowels, and other speakers from Kumamoto University, who aided in recording the words en_US
dc.description.abstract In this paper, our research on speech classification using an image classification approach is discussed for the Myanmar language. We tested the method for Myanmar consonants, vowels, and words, on our recorded database of 22-consonant, 12-vowel, and 54-word sound classes, containing spectrograms of Myanmar speech. Because Myanmar language is tonal, the sounds are very similar for precise classification based on audio features, while the visual representations differ. Therefore, it is important to consider the visual representations of audio in classifying the Myanmar language. In this study, we treated Myanmar speeches with a convolutional neural network model (Inception-v3) to fit spectrogram images, performing transfer learning from pre-trained weights on ImageNet. Validation accuracies of 60.70%, 73.20%, and 94.60% were achieved for the consonant, vowel, and word-level classifications, respectively. In order to determine the retrained model performance, both closed and open testing were conducted. Although our experiment was distinct from other traditional audio classification methods, promising results were obtained for the first exploration of Myanmar speech classification using transfer learning for image classification. In fact, these experimental results were attained using Google’s Inception-v3 model, constructed with different image domains. Therefore, the research and results demonstrate that it is possible to perform Myanmar speech classification. en_US
dc.language.iso en en_US
dc.publisher Seventeenth International Conference on Computer Applications(ICCA 2019) en_US
dc.title Myanmar Speech Classification Using Transfer Learning for Image Classification en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account