Abstract:
Nowadays, there are top performance of machine translation systems for some
foreign languages (high resource languages). Machine Translation (MT) is the
automatic translation mechanism from one natural language into another language by
means of a computerized system. There are many researches using machine
translation systems is not only foreign languages but also Myanmar Ethnic languages
(lower source languages) such as English-Myanmar, Myanmar-Rakhine, Myanmar Dawei and Kachin-Rawang and so on. In this system, over 10K Karen-English
parallel sentences are collected from Karen-English published books via internet and
other sources. And the phrase-based statistical machine translation system is proposed
by using the Moses toolkit for Karen and English language pairs. The word
segmented source language was aligned with the word segmented target language
using GIZA++. The alignment was symmetrized by grow-diag-final and heuristic.
The lexicalized reordering model was trained with the msd-bidirectional-fe option.
We use KenLM and SRILM for training with 2-gram, 3-gram and 5-gram language
models for both Karen to English and English to Karen language pairs. Minimum
error rate training (MERT) was used to tune the decoder parameters and the decoding
was done using the Moses decoder. Finally, the experimental results of the system are
measured in terms of BLEU scores and compared them. For Karen to English PBSMT
model, the experimental result of KenLM with 5-gram language model is the best. And
the experimental result of KenLM with 3-gram language model is the best for English
to Karen PBSMT model.