Assigning automatically Part-of-Speech tags to build tagged corpus for Myanmar language

Myint, Phyu Hninn

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fifth Local Conference on Parallel and Soft Computing
/
View Item

Assigning automatically Part-of-Speech tags to build tagged corpus for Myanmar language

Myint, Phyu Hninn

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/1262

Date: 2010-12-16

Abstract:

A variety of Natural Language Processing (NLP) tasks, such as named entity recognition, stemming, question answering and machine translation, benefit from knowledge of the words syntactic categories or Part-of-Speech (POS). POS taggers must be successfully applied to assign a single best POS to every word in a corpus.This paper presents to develop Part-of-Speech tagged text corpora by employing Bigram part-of-speech tagger. POS tagging is a process of assigning appropriate syntactic categories to each word in a sentence. As applying bigram model for automated tagging process we have provided an adequate annotated corpus from scratch. We have used customized POS tagset to annotate the words in a Myanmar sentence. Our Bigram tagger has two phases: training with Hidden Markov Models (HMM) and decoding with Viterbi algorithm.

Show full item record