Abstract:
Time Delay Neural Network (TDNN) contains in
neural network architectures. In Automatic Speech
Recognition, TDNN is strong possibility in context modeling
and recognizes phonemes and acoustic features, independent
of position in time. There are many techniques have been
applied for improving Myanmar speech processing. TDNN
based acoustic model for Myanmar ASR in this paper.
Myanmar language is a low resource language and no precollected data is available. A larger dataset and lexicon than
our previous work are applied in this experiment. The speech
corpus contains three domains: Names, Web News data and
Daily conversational data. The size of the corpus is 77 Hrs
and 2 Mins and 11 Secs and include 233 female speakers and
97 male speakers. The performance of TDNN for Myanmar
ASR is shown by comparing with Gaussian Mixture Model
(GMM) as a baseline system, Deep Neural Network (DNN)
and Convolutional Neural Network (CNN). Experiments
evaluation is used 2 test data: TestSet1, web news and
TestSet2, recorded conversational data. The experimental
results show that TDNN outperforms GMM-HMM, DNN and
CNN.