Building Word-Aligned Bilingual Corpus for Statistical Myanmar-English Translation

Nwet, Khin Thandar

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fifth Local Conference on Parallel and Soft Computing
/
View Item

Building Word-Aligned Bilingual Corpus for Statistical Myanmar-English Translation

Nwet, Khin Thandar

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/1135

Date: 2010-12-16

Abstract:

In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe our work in constructing an aligned English-Myanmar parallel corpus. Corpora are not available for Myanmar language and our work in developing parallel corpus will also hopefully be very useful in many natural language applications. Word alignment plays a crucial role in statistical machine translation, since word-aligned corpora have been found to be an excellent source of translation-related knowledge. If there were errors in alignment, this will cause subsequence failure NLP processes. The alignments produced when the training on word-aligned data are dramatically better than when training on sentence-aligned data. The main purpose of this system is to provide as part of translation machine in Myanmar-English machine translation. The proposed system is combination of corpus based approach and dictionary lookup approach. The corpus based approach is based on the first three IBM models

Show full item record