Abstract:
The automatic recognition of speech means enabling a natural and
easy mode of interaction between human and machine. The process of
speech recognition is to translate speech signal into text sequence.
Automatic Speech Recognition (ASR) has been carried out by many
researchers for their particular languages to provide their nations in
language technologies. Therefore, this thesis aims to develop automatic
speech recognition for Rakhine language, one of the main ethnic groups in
Myanmar. Rakhine language is a low-resourced language and speech data
are no freely available. Thus, in this work, speech corpus is built on two
domains: broadcast news and daily conversations data. Broadcast news is
collected from the web and the conversational data is recorded by uttering
with own voice. This corpus is applied to develop the Rakhine ASR.
Feature extraction is one of the components of ASR and its function is to
extract feature from incoming speech signals. In this work, Mel Frequency
Cepstral Coefficient (MFCC) feature extraction technique is used.
Because of the phonetic dictionary is essential part for implementing
Rakhine speech recognition system, pronunciation lexicon is built for
Rakhine language in this work. And, Rakhine language model is also
created by utilizing n-gram. In developing ASR, acoustic models is the
crucial component and is established the connection between acoustic
feature and phonetic. For this Rakhine ASR research, the Gaussian Mixture
based Hidden Markov Model (HMM-GMM) is utilized. By using HMM-GMM, Rakhine ASR performance gets promising results.