Abstract:
This system presents a novel approach to Burmese (Myanmar) -English Named
Entity Transliteration System leveraging Transformer models, focusing on character, sub-
syllable, and syllable segmentation based on a meticulously prepared dictionary
containing both foreign and native Myanmar-English entries. Transliterating named
entities accurately between Myanmar and English poses significant challenges due to
script differences, linguistic nuances, and varying entity structures. The proposed system
addresses these challenges by incorporating advanced segmentation techniques and a
comprehensive dictionary. The core of the approach lies in the segmentation of Myanmar
named entities into character-level, sub-syllable, and syllable units, utilizing linguistic
knowledge and domain-specific dictionaries. Linguistic rules are employed to segment
Myanmar text into meaningful units, capturing the rich morphology and orthographic
complexities of the Myanmar script. This segmentation process is crucial for accurately
aligning Myanmar entities with their English transliterations. The system is built upon the
Transformer architecture, a state-of-the-art deep learning model renowned for its
sequence-to-sequence capabilities and attention mechanisms. The Transformer model is
trained on a large corpus derived from our prepared Myanmar-English dictionary,
learning the intricate mappings and transliteration patterns between the two languages.
The performance of the system is evaluated using a benchmark dataset comprising
diverse Myanmar named entities and their corresponding English transliterations. The
experimental results demonstrate the efficacy of the approach, achieving superior
transliteration accuracy compared to baseline methods. Extensive analyses are also
conducted to investigate the impact of different segmentation strategies, dictionary sizes,
and model configurations on transliteration quality. In conclusion, the Myanmar-English
Named Entity Transliteration System based on character, sub-syllable, and syllable
segmentation, coupled with a meticulously prepared dictionary, represents a significant
advancement in cross-lingual natural language processing. The system offers a reliable
and efficient solution for transliterating Myanmar named entities into English with
exceptional accuracy and scalability, paving the way for enhanced multilingual
communication and data interoperability.