Abstract:
We present the grapheme cluster segmentation tool for Myanmar text based on our proposed Positional Prediction text input concept. Motivation of this research is to develop the Positional Prediction database of Myanmar consonants from the existing Myanmar electronic documents such as PDF e-books, Microsoft Word documents. In this paper, we introduce segmentation rule, implementation process of Positional Prediction combination pattern segmentation and character segmentation. We also present difficulties of encoding conversion from old ASCII based font to Unicode font, developing process and initial study results with the content of a Myanmar e-book of 62 pages. This grapheme cluster segmentation approach is not only useful for creating Positional Prediction database but also applicable for statistical analysis on distributions of characters and Positional Prediction patterns of Myanmar language