Abstract:
In recent years statistical word alignment
models have been widely used for various Natural
Language Processing (NLP) problems. In this paper
we describe our work in constructing an aligned
English-Myanmar parallel corpus. Corpora are not
available for Myanmar language and our work in
developing parallel corpus will also hopefully be
very useful in many natural language
applications. Word alignment plays a crucial role in
statistical machine translation, since word-aligned
corpora have been found to be an excellent source of
translation-related knowledge. If there were errors in
alignment, this will cause subsequence failure NLP
processes. The alignments produced when the
training on word-aligned data are dramatically
better than when training on sentence-aligned data.
The main purpose of this system is to provide as part
of translation machine in Myanmar-English machine
translation. The proposed system is combination of
corpus based approach and dictionary lookup
approach. The corpus based approach is based on
the first three IBM models.