Abstract:
A lot of research is currently ongoing in word segmentation and POS tagging
developed differently with various methods. Separate word segmenters and POS
taggers are also available for Myanmar Language, based on computational methods
such as Neural Network (NN) and Hidden Markov Models (HMM). There is no
research in joint word segmentation and POS tagging for Myanmar Language. Thus,
this research intends to develop joint Myanmar word segmentation and POS tagging
based on Hidden Markov Model and morphological rules. The morphology of the
language through a systematic linguistic study is important in order to reveal words
that are significant to users such as historians, linguists.
As there are no space explicitly needed between the words in Myanmar
language writing style, the first processing step is to break the text into units called
tokens in which each is either a word or something like a number. In word
segmentation and POS tagging, the structure of morphological words is the main
source of information to get the correct process of tagging. By using the
morphological structure of words, eliminate irrelevant tags can be removed and the
suitable tag is found for the word. Therefore, morphological analysis is an important
part of language engineering applications especially for morphologically rich and
complex language like Myanmar.
Most of the current research on Myanmar language used a lexicon or
dictionary or corpus which lists all the word for word segmentation as an initial stage
of processing. The proposed system uses HMM and morphological rules for word
segmentation and POS tagging. The evaluation result shows that accuracy achieved
94%.