Abstract:
Part-of-speech (POS) tagging is the process
of assigning the part-of-speech tag or other
lexical class marker to each and every word in a
sentence. In many Natural Language Processing
(NLP) applications such as word sense
disambiguation, information retrieval,
information processing, parsing, question
answering, and machine translation, POS
tagging is considered as the one of the basic
necessary tools. Identifying the ambiguities in
lexical items is the challenging objective in the
process of developing an efficient and accurate
POS Tagger. This paper proposes the
developments for POS-tagger and POS-tagset of
Myanmar language, which is very essential
computational linguistic tool needed for many
natural language processing (NLP) applications.
Since most previous works for HMM-based
tagging consider only part-of-speech
information in contexts, their models cannot
utilize lexical information which is crucial for
resolving some morphological ambiguity. In this
paper, a simple method to build Lexicalized
Hidden Markov Models (L-HMMs) is introduced
for improving the precision of part-of-speech
tagging in Myanmar language. In experiments,
lexicalized models achieve higher accuracy than
non-lexicalized models.