Abstract:
Myanmar texts are different from
English texts in that they have no spaces to
mark the boundaries of words. So, Myanmar
word segmentation is a difficult. The
processing of Myanmar text is complicated
by the fact that there are no word delimiters.
To segment Myanmar word, systems
typically use knowledge-based methods and
large lexicons. This paper presents the
ability of linear-chain conditional random
fields (CRFs) to perform Myanmar word
segmentation by providing lexicon. This
paper also presents a probabilistic new
word detection method by providing the
access variety (AV) statistics and forwardbackward algorithm. This system constructs
the lexicon to improve the new word
identification.