Abstract:
In this paper, we describe a joint work on word
segmentation and stemming of Myanmar sentences
with syllabled-based tagging under Conditional
Random Fields(CRF) framework. A manuallysegmented
corpus was developed to train the
segmenter, and we implement it as a 7-tag syllablebased
tagging and stemming with conditional random
fields(CRF). And then, the trained CRF segmenter was
compared to a baseline approached based on longest
matching that used a dictionary extracted from
manually segmented corpus. In our approach, we can
achieve comparative performances compared to 4-tag
syllable tagging approach. The experimental results
show that the CRF with 7-tag set and word feature
improve the stemming performance.