Abstract:
This dissertation focuses on enhancing Myanmar Information Retrieval (IR)
system to generate more natural text for a given input text. Typical IR systems have
two main components: text query (user needs or preferences) and text documents
(related to text query). Both text query and documents are important for the clarity
and effectiveness of the IR system. Therefore, this research is emphasized on both text
query and documents in Myanmar IR system.
In the contemporary era dominated by Information Technology (IT), search
engines such as Google have become ubiquitous tools for individuals seeking access
to a vast array of information. These platforms serve as indispensable resources,
enabling users to effortlessly locate and acquire knowledge on a myriad of topics
according to their needs and interests. Searching for News in English or Myanmar has
become incredibly convenient, requiring a minimal effort to access a wealth of
information.
The structure of IR has been altered dramatically by the inclusion of neural
models, facilitating a more refined analysis of textual data. The textual data for
Myanmar News dataset has been prepared in this research. In this research, the
Myanmar News dataset was collected from Myanmar News website. In this dataset,
each document contains two parts: title and contents.
The evaluations on different neural ranking models were conducted and so the
results are thoroughly analyzed and discussed. A comprehensive analysis has started,
with immersion in the use of various neural ranking models to comprehend intricate
semantic connections, ultimately enhancing the effectiveness of IR systems. Pivotal
neural ranking models such as DRMM, MP, Duet, KNRM, PACRR, CONV-KNRM,
MZ-CONV-KNRM, which have left a profound impact on the field, are delved deep
into, investigating their implications for enhancing the precision and efficiency of
retrieval systems.
Another evaluation was done using a fine-tuning approach with the pre-trained
model, Vanilla-BERT. The superior performance of this model compared to baseline
methods, showcasing improvements in MAP, MRR, P@1 and P@3 overall retrieval
performance. The implications of these findings extend to retrieve the similarity score
results, highlighting the potential for enhanced IR capabilities.