UCSY's Research Repository

MyanmarBERT:Myanmar Pre-trained Language Model using BERT

Show simple item record

dc.contributor.author Win, Saw
dc.contributor.author Pa, Win Pa
dc.date.accessioned 2022-07-04T05:48:05Z
dc.date.available 2022-07-04T05:48:05Z
dc.date.issued 2021-02-25
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2702
dc.description.abstract Myanmar language is a low-resource language as well as obtaining large-scale cleaned data for natural language processing(NLP) tasks, it is challenging and expensive with the progress in NLP. Deep learning has boosted the development of pre-trained language model has led to significant performance gains. Despite their popularity, the majority of available models have been either trained on English data or multi-language data concatenation. This makes very limited practical use of such models, in all languages except English. Currently, monolingual pre-trained language models based on Bidirectional Encoder Representations from Transformers (BERT) show that their performance outperforms multi-lingual models in many downstream NLP tasks, under same configurations. However, a large monolingual corpus and monolingual pre-trained language model for Myanmar language are not available publicly yet. In this paper, we introduce a large monolingual corpus called MyCorpus and also release Myanmar pre-trained language model(MyanmarBERT) based on BERT. Myanmar NLP tasks such as part-of-speech (POS) tagging and named- entity recognition (NER) have been used for evaluation on MyanmarBERT and Multilingual BERT(M-BERT). The comparative results over these two models are presented. MyanmarBERT will be useful for researchers working on the Myanmar NLP and pre-trained model is available at http://www.nlpresearch-ucsy.edu.mm/mybert.html. en_US
dc.language.iso en_US en_US
dc.publisher ICCA en_US
dc.subject BERT, Pre-trained Language Model, Named Entity Recognition, POS tagging, Myanmar Language en_US
dc.title MyanmarBERT:Myanmar Pre-trained Language Model using BERT en_US
dc.type Presentation en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics