MYANMAR TEXT TO IMAGE SYNTHESIS USING  GENERATIVE ADVERSARIAL NETWORKS

Htwe, Nang Kham

dc.contributor.author	Htwe, Nang Kham
dc.date.accessioned	2024-03-25T20:46:08Z
dc.date.available	2024-03-25T20:46:08Z
dc.date.issued	2024-03
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2798
dc.description.abstract	The goal of text-to-image synthesis is to automatically create an image that corresponds to a given text description. It is the process of training a computer model to understand natural language and translate it into visual representations. One of the challenges of text-to-image synthesis is the semantic gap between natural language and visual representations. Natural language processing and computer vision techniques can be used to bridge this gap by mapping textual input to visual representations, which helps to generate more accurate and meaningful images. In text-to-image synthesis, computer vision is used to generate images that correspond to the textual input. On the other hand, natural language processing is used to process the textual input and extract meaningful information from it. Text to image synthesis has gained popularity in recent years according to the advancements results in deep learning. It has become an active research area in artificial intelligence and has attracted searchers, practitioners, and the general public to focus on this research. However, Text to image synthesis for Myanmar is a challenging research problem because there are several factors that make generating images from textual descriptions difficult. One of challenge is the scarcity of large-scale annotated datasets of textual descriptions and corresponding images in Myanmar. Therefore, Myanmar caption corpus is manually built based on Oxford-102 flowers dataset to build Myanmar text-to-image synthesis (T2I) model. In this dissertation, Myanmar T2I model is proposed using Generative Adversarial Networks (GANs). Firstly, Myanmar T2I based DCGAN is proposed to create images from Myanmar text descriptions. However, this model can generate low resolution images (64x 64). For this reason, AttnGAN and DF-GAN are used to investigate which model enable to generate high-resolution images (256 x 256) with semantic accuracy from Myanmar text descriptions. In this comparison, DFGAN gives better result for Myanmar T2I. Moreover, DF-GAN+MSM (multimodal similarity model) is proposed in order to generate semantically consistency images with precise in shape for Myanmar language because there are artifacts that need to enhance on the generated images of DF-GAN. In DF-GAN+MSM, DFGAN is applied to generate images from Myanmar text descriptions. Multimodal similarity model is used to evaluate the matching score between Myanmar text and the generated images during training of the model. This model contains two networks: text encoder and image encoder. The evaluation on the performance of the models-based Myanmar T2I is done on two areas: quantitative analysis and qualitative analysis to assess the quality of the generated images. In quantitative analysis, DFGAN+MSM got the highest inception scores and the lowest FID scores of the generated images. In addition, DFGAN+MSM obtained the highest preferences scores based on qualitative evaluation by human perception. Moreover, the proposed model is also implemented with UCSD-CUB birds dataset annotated English to prove that this model gives the progressive results for the quality of synthesized images from different languages and different dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.subject	Myanmar Text to Image Synthesis	en_US
dc.subject	Generative Adversarial Networks	en_US
dc.title	MYANMAR TEXT TO IMAGE SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS	en_US
dc.type	Thesis	en_US