Abstract:
Language acquisition for robot is a challenging topic
in the artificial intelligence research area and essential
for natural communication between robot and human.
In this paper, we proposed language acquisition directly
from motion video and user’s utterance with multimodal
machine learnings without prior knowledge of linguistic
or language specific information. Translation between
acquired conceptual structure and syllable sequences of
a human language (e.g. Japanese language) was carried
out by applying machine translation methodologies
including sequence-to-sequence learning. Experiments
on language acquisition with 500 videos show Encoder-
Decoder, Encoder-Decoder with Attention models are
able to achieve equal translation performance of baselines
that was prepared manually.