Abstract:
Myanmar language is spoken by more than 33
million people and use itas an official language of the
Republic of the Union of Myanmar in bothverbal and
written communication. With the rapid growth of
digital content in Myanmar Language, applications
like machine learning, translation and information
retrieval become popular and it required to obtain
the effective Natural Language Processing (NLP)
studies.The main objective of this paper is to study
Myanmar words morphology, to implement n-gram
based word segmentation and to propose
grammatical stemming rules and POS tagging rules
for Myanmar language. So, this paper proposed the
word segmentation, stemming and POS tagging
based on n-gram method and rule-based stemming
method that has the ability to cope the challenges of
Myanmar NLP tasks. The proposed system not only
generates the segmented words but also generates the
stemmed words with POS tag by removing prefixes,
infixes and suffixes. The proposed system provides
80% to 85 % accuracy. The data are collected from
several online sources and the system is implemented
using Python language.