UCSY's Research Repository

A Proposal on Statistical Language Model Building for Under-Resourced Languages

Show simple item record

dc.contributor.author Nakajima, Hideharu
dc.date.accessioned 2019-07-04T05:42:26Z
dc.date.available 2019-07-04T05:42:26Z
dc.date.issued 2012-02-28
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/430
dc.description I would like to express my sincere thanks to Dr. Ye Kyaw Thu for a useful discussion on the basics of Myanmar language, to Dr. Satoshi Takahashi at NTT Media Intelligence Lab. and Professor Yoshinori Sagisaka at Waseda University for their warm encouragement, and to NTT Media Intelligence Lab. for their support. en_US
dc.description.abstract Corpora must be developed for realizing statistical language models; however, it is labor-intensive work and is very expensive. Often corpora resources are scant for such a domain as new language, new task, and new application. This situation is called “underresourced.” Myanmar must be a one of under-resourced languages for Japanese computational linguists, and vice versa. For both languages, mother tongues are “well-resourced.” Fortunately, as Japanese and Myanmar show the same Subject-Object-Verb (SOV) word order, higher translation quality can be expected between these two languages. Thus, Japanese and Myanmar can be mediator (or hub) languages to other languages each other. This paper proposes the reutilization of well-resourced domain data to offset under-resourced domain data. As a feasible example, this paper introduces the essence of my past research [1, 2] on model adaptation with machine translated text. en_US
dc.language.iso en en_US
dc.publisher Tenth International Conference On Computer Applications (ICCA 2012) en_US
dc.title A Proposal on Statistical Language Model Building for Under-Resourced Languages en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics