Abstract:
A variety of Natural Language Processing (NLP)
tasks, such as named entity recognition, stemming,
question answering and machine translation, benefit
from knowledge of the words syntactic categories or
Part-of-Speech (POS). POS taggers must be
successfully applied to assign a single best POS to
every word in a corpus.This paper presents to develop
Part-of-Speech tagged text corpora by employing
Bigram part-of-speech tagger. POS tagging is a
process of assigning appropriate syntactic categories
to each word in a sentence. As applying bigram model
for automated tagging process we have provided an
adequate annotated corpus from scratch. We have used
customized POS tagset to annotate the words in a
Myanmar sentence. Our Bigram tagger has two
phases: training with Hidden Markov Models (HMM)
and decoding with Viterbi algorithm.