Word Segmentation and New Word Identification of Myanmar Language

Soe, Ei Phyu Phyu

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Tenth International Conference On Computer Applications (ICCA 2012)
/
View Item

Word Segmentation and New Word Identification of Myanmar Language

Soe, Ei Phyu Phyu

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2262

Date: 2012-02-28

Abstract:

Myanmar texts are different from English texts in that they have no spaces to mark the boundaries of words. So, Myanmar word segmentation is a difficult. The processing of Myanmar text is complicated by the fact that there are no word delimiters. To segment Myanmar word, systems typically use knowledge-based methods and large lexicons. This paper presents the ability of linear-chain conditional random fields (CRFs) to perform Myanmar word segmentation by providing lexicon. This paper also presents a probabilistic new word detection method by providing the access variety (AV) statistics and forwardbackward algorithm. This system constructs the lexicon to improve the new word identification.

Show full item record