A Study of Myanmar Word Segmentation Schemes for Statistical Machine Translation

Thu, Ye Kyaw; Finch, Andrew; Sagisaka, Yoshinori; Sumita, Eiichiro

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Eleventh International Conference On Computer Applications (ICCA 2013)
/
View Item

dc.contributor.author	Thu, Ye Kyaw
dc.contributor.author	Finch, Andrew
dc.contributor.author	Sagisaka, Yoshinori
dc.contributor.author	Sumita, Eiichiro
dc.date.accessioned	2019-10-23T13:25:31Z
dc.date.available	2019-10-23T13:25:31Z
dc.date.issued	2013-02-26
dc.identifier.uri	http://onlineresource.ucsy.edu.mm/handle/123456789/2335
dc.description.abstract	Myanmar sentences are written as contiguous sequences of syllables with no characters delimiting the words. In statistical machine translation (SMT), word segmentation is a necessary step for languages that do not naturally delimit words. Myanmar is a low-resource language and therefore it is difficult to develop a good word segmentation tool based on machine learning techniques. In this paper, we examine various word segmentation schemes and their effect on the translation from Myanmar to seven other languages. We performed experiments based on character segmentation, syllable segmentation, human lexical/phrasal segmentation, and unsupervised/supervised word segmentation. The results show that the highest quality machine translation was attained with syllable segmentation, and we found this effect to be greatest for translation into subject-objectverb (SOV) structured languages such as Japanese and Korean. Approaches based on machine learning were unable to match this performance for most language pairs, and we believe this was due to the lack of linguistic resources. However, a machine learning approach that extended syllable segmentation produced promising results and we expect this can be developed into a viable method as more data becomes available in the future.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Eleventh International Conference On Computer Applications (ICCA 2013)	en_US
dc.title	A Study of Myanmar Word Segmentation Schemes for Statistical Machine Translation	en_US
dc.type	Article	en_US