Abstract:
Grapheme to Phoneme Conversion (G2P) is the task of automatically generated the
pronunciation on a given input word. Pronunciation dictionary is one of the most important
things for building automatic speech recognition (ASR) and Text-to-Speech Systems (TTS).
The G2P conversion model is implemented for foreign words in Myanmar language using n gram language modeling and Weighted Finite State Transducer (WFST) based approach.
Firstly, Pronunciation Dictionary was built for foreign words in Myanmar language. The
alignment of corresponding grapheme to phoneme sequence pair had been generated on the
dictionary. A joint n-gram model was trained based on joint grapheme to phoneme chunks
aligned during the training process. Finally, the joint n-gram model is converted to an
equivalent Weighted Finite State Transducer (WFST). The performance of the model has
been evaluated based on Phoneme Error Rate (PER). To ensure the validity of manually
prepared pronunciation dictionary and the consistency of the performance of the G2P model,
10-fold cross validation was applied on the data and 2.36% in average Phoneme Error Rate
(PER) was obtained for a test set.