Joint Word Segmentation and Part-of-Speech Tagging for Myanmar Language

Cing, Dim Lam; Soe, Khin Mar

Joint Word Segmentation and Part-of-Speech Tagging for Myanmar Language

Cing, Dim Lam; Soe, Khin Mar

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2530

Date: 2020-08

Abstract:

A lot of research is currently ongoing in word segmentation and POS tagging developed differently with various methods. Separate word segmenters and POS taggers are also available for Myanmar Language, based on computational methods such as Neural Network (NN) and Hidden Markov Models (HMM). There is no research in joint word segmentation and POS tagging for Myanmar Language. Thus, this research intends to develop joint Myanmar word segmentation and POS tagging based on Hidden Markov Model and morphological rules. The morphology of the language through a systematic linguistic study is important in order to reveal words that are significant to users such as historians, linguists. As there are no space explicitly needed between the words in Myanmar language writing style, the first processing step is to break the text into units called tokens in which each is either a word or something like a number. In word segmentation and POS tagging, the structure of morphological words is the main source of information to get the correct process of tagging. By using the morphological structure of words, eliminate irrelevant tags can be removed and the suitable tag is found for the word. Therefore, morphological analysis is an important part of language engineering applications especially for morphologically rich and complex language like Myanmar. Most of the current research on Myanmar language used a lexicon or dictionary or corpus which lists all the word for word segmentation as an initial stage of processing. The proposed system uses HMM and morphological rules for word segmentation and POS tagging. The evaluation result shows that accuracy achieved 94%.

Show full item record