Abstract:
In the applications of Natural language
processing (NLP), sentence analysis is one of the
important phases for machine translation
systems. Currently, no mature deep analysis that
has been worked done is available for Myanmar
language. To perform shallow parsing on
sentences, the chunk identification is a
fundamental task. The POS tagged corpus
creation has been proposed in [8] and in this
paper, we have proposed a methodology for
building chunk tagged corpus for Myanmar
Language. We use the POS tagged corpus that is
proposed in [8] and identify chunks in Myanmar
POS tagged texts. Our approach uses rule-based
on how to identify all chunks in a Myanmar
sentence. As a preprocessing step, normalization
of POS tags is needed to perform in order to
produce finer tags. Hence, normalization rules
are also developed. After normalization, chunk
rules are applied to tag chunk for these finer
tags. Our chunk tagged corpus is very useful in
Myanmar to English machine translation system.