Myanmar Word Stemming and POS Tagging using Rule Based Approach

Minn, Kyaw Htet

dc.contributor.author	Minn, Kyaw Htet
dc.date.accessioned	2019-09-23T04:41:56Z
dc.date.available	2019-09-23T04:41:56Z
dc.date.issued	2019-03
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/2242
dc.description.abstract	Myanmar language is spoken by more than 33 million people and use it as a verbal and written communication which is an official language of the Republic of the Union of Myanmar. With the rapid growth of digital content in Myanmar Language, applications like machine learning, translation and information retrieval become popular and it required to obtain the effective Natural Language Processing (NLP) studies. The NLP field on Myanmar language still has a big challenge. Segmenting, stemming and Part-Of- Speech (POS) tagging are pre-processing steps in Text Mining applications as well as a very common requirement of Natural Language processing functions. In fact, it is very important in most of the Information Retrieval systems. The main objective of this thesis is to study Myanmar words morphology, to implement ngram based word segmentation and to propose grammatical stemming rules and POS tagging rules for Myanmar language. This thesis proposed the word segmentation, stemming and POS tagging based on n-gram method and rule-based stemming method that has the ability to cope the challenges of Myanmar NLP tasks. This system not only generates the segmented words but also generates the stemmed words with POS tag by removing prefixes, infixes and suffixes. It provides 82 % accuracy. The data are collected from several online sources and the system is implemented using Python language.	en_US
dc.language.iso	en_US	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.title	Myanmar Word Stemming and POS Tagging using Rule Based Approach	en_US
dc.type	Thesis	en_US