dc.description.abstract |
Researchers of many nations have developed automatic speech recognition
(ASR) to show their national improvement in information and communication
technology for their languages.
The dissertation aims to develop good quality Myanmar language automatic
speech recognition on read speech. Myanmar language is being considered as a lowresourced language. Thus, there is no speech corpus which is freely and commercially
available for ASR research. Therefore, a speech corpus named “University of
Computer Studies Yangon - Speech Corpus (UCSY-SC1)” which is essential for
Myanmar ASR research is constructed. The speech corpus is developed by using two
types of domains: web news and daily conversations. The news is collected from the
Internet and the conversational data is recorded by ourselves. This corpus is applied to
build the Myanmar ASR.
Myanmar language is one of the tonal languages and different types of tones
convey the difference in meanings. Therefore, like the other tonal languages such as
Mandarin, Vietnamese and Thai, tone information is significantly played to improve
the Myanmar ASR performance. Moreover, syllable is the basic unit of Myanmar
language. Thus, in this work, the effect of tones is explored on both syllable and
word-based ASR models. The comparison of syllable-based ASR model and wordbased ASR model is also done.
In this work, Myanmar ASR is built by applying state-of-the-art acoustic
model, Convolutional Neural Network (CNN). In low-resourced condition, CNN is
better than Deep Neural Network (DNN) because the fully connected nature of the
DNN can cause overfitting. And it degrades the ASR performance for low-resourced
languages where there is a limited amount of training data. CNN can alleviate these
problems and it is very useful for a low-resourced language such as Myanmar.
Furthermore, CNN can model well tone patterns because it can reduce spectral
variations and model spectral correlations existing in the signal. In this task, it showed
that CNN outperformed DNN and Gaussian Mixture Model (GMM)-Hidden Markov
Model (HMM). The best accuracy is achieved with CNN-based model in Myanmar
ASR. |
en_US |