Myanmar Word Stemming and POS Tagging using Rule Based Approach

Minn, Kyaw Htet

Myanmar Word Stemming and POS Tagging using Rule Based Approach

Minn, Kyaw Htet

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2242

Date: 2019-03

Abstract:

Myanmar language is spoken by more than 33 million people and use it as a verbal and written communication which is an official language of the Republic of the Union of Myanmar. With the rapid growth of digital content in Myanmar Language, applications like machine learning, translation and information retrieval become popular and it required to obtain the effective Natural Language Processing (NLP) studies. The NLP field on Myanmar language still has a big challenge. Segmenting, stemming and Part-Of- Speech (POS) tagging are pre-processing steps in Text Mining applications as well as a very common requirement of Natural Language processing functions. In fact, it is very important in most of the Information Retrieval systems. The main objective of this thesis is to study Myanmar words morphology, to implement ngram based word segmentation and to propose grammatical stemming rules and POS tagging rules for Myanmar language. This thesis proposed the word segmentation, stemming and POS tagging based on n-gram method and rule-based stemming method that has the ability to cope the challenges of Myanmar NLP tasks. This system not only generates the segmented words but also generates the stemmed words with POS tag by removing prefixes, infixes and suffixes. It provides 82 % accuracy. The data are collected from several online sources and the system is implemented using Python language.

Show full item record