UCSY's Research Repository

Improving Myanmar Image Caption Generation Using NASNetLarge and Bi-directional LSTM

Show simple item record

dc.contributor.author Aung, San Pa Pa
dc.contributor.author Pa, Win Pa
dc.contributor.author Nwe, Tin Lay
dc.date.accessioned 2022-06-20T08:00:39Z
dc.date.available 2022-06-20T08:00:39Z
dc.date.issued 2021-02-25
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2608
dc.description.abstract The main objective of this paper is to improve the automatic Myanmar captions by learning the contents of images using NASNetLarge and Bi-LSTM model. Describing the contents of an image is a complex task for machine without human intervention. Computer Vision and Natural Language Processing are widely used to tackle this problem. This paper proposed a deep learning-based Myanmar image captioning system which used a NASNetLarge feature extraction model of CNN as an encoder and a deep Recurrent Neural Network (RNN) with Bi-directional Long Short-Term Memory (LSTM) as a decoder. For corpus construction, we created and annotated the Myanmar image captions corpus (consists of over 40k Myanmar sentences), which is based on Flickr8k dataset. Furthermore, two different types of segmentations such as word segmentation level and syllable segmentation level are studied in text preprocessing step. In this work, the proposed Bi- directional LSTM model is compared with LSTM, GRU as well as the baseline model. Experiments on the updated dataset is presented that all of our models using syllable segmentation give higher and comparable BLEU scores than word segmentation for Myanmar image captioning system. NASNetLarge with Bi- directional LSTM model using syllable segmentation approach achieved the highest BLEU-4 score 40.05% which is 12.5% better than word segmentation in this work and 15.67% BLEU- 4 score better than our previous work. en_US
dc.publisher ICCA en_US
dc.subject NASNetLarget, Recurrent Neural Network, Long Short-Term Memory, Gated Recurrent Unit en_US
dc.title Improving Myanmar Image Caption Generation Using NASNetLarge and Bi-directional LSTM en_US
dc.type Presentation en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics