Abstract:
Word segmentation is a basic task and an important problem in natural
language processing. In Myanmar language, words composed of single or multiple
syllables are usually not separated by white space. Myanmar word segmentation is to
determine the boundaries of words for languages without word separators in
orthography. This system uses a 2-step longest matching approach. The first step was
syllable segmentation and second uses Hybrid Approach of left-to-right syllable
maximum matching and hierarchical expectation maximization approach. This system
is intended to be able to use as a pre-processing tool in Myanmar text processing such
as Machine Translation, Information Retrieval, Search Engine using Myanmar
language. The experimental result shows 93% of accuracy based on a collection of
300 articles from the business, entertainment and sports sections of the Myanmar
newspaper nearly 35,000 words. The proposed word segmentation is implemented as
a web-based tool using C# .Net language.