Chunk Tagged Corpus Creation for Myanmar Language

Myint, Phyu Hninn; Htwe, Tin Myat; Thein, Ni Lar

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Ninth International Conference On Computer Applications (ICCA 2011)
/
View Item

Chunk Tagged Corpus Creation for Myanmar Language

Myint, Phyu Hninn; Htwe, Tin Myat; Thein, Ni Lar

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/160

Date: 2011-05-05

Abstract:

In the applications of Natural language processing (NLP), sentence analysis is one of the important phases for machine translation systems. Currently, no mature deep analysis that has been worked done is available for Myanmar language. To perform shallow parsing on sentences, the chunk identification is a fundamental task. The POS tagged corpus creation has been proposed in [8] and in this paper, we have proposed a methodology for building chunk tagged corpus for Myanmar Language. We use the POS tagged corpus that is proposed in [8] and identify chunks in Myanmar POS tagged texts. Our approach uses rule-based on how to identify all chunks in a Myanmar sentence. As a preprocessing step, normalization of POS tags is needed to perform in order to produce finer tags. Hence, normalization rules are also developed. After normalization, chunk rules are applied to tag chunk for these finer tags. Our chunk tagged corpus is very useful in Myanmar to English machine translation system.

Show full item record