Abstract:
Corpora must be developed for realizing statistical
language models; however, it is labor-intensive work
and is very expensive. Often corpora resources are
scant for such a domain as new language, new task, and
new application. This situation is called “underresourced.”
Myanmar must be a one of under-resourced
languages for Japanese computational linguists, and
vice versa. For both languages, mother tongues are
“well-resourced.” Fortunately, as Japanese and
Myanmar show the same Subject-Object-Verb (SOV)
word order, higher translation quality can be expected
between these two languages. Thus, Japanese and
Myanmar can be mediator (or hub) languages to other
languages each other. This paper proposes the
reutilization of well-resourced domain data to offset
under-resourced domain data. As a feasible example,
this paper introduces the essence of my past research [1,
2] on model adaptation with machine translated text.
Description:
I would like to express my sincere thanks to Dr. Ye
Kyaw Thu for a useful discussion on the basics of
Myanmar language, to Dr. Satoshi Takahashi at NTT
Media Intelligence Lab. and Professor Yoshinori
Sagisaka at Waseda University for their warm
encouragement, and to NTT Media Intelligence Lab. for
their support.