MyanmarBERT:Myanmar Pre-trained Language Model using BERT

Win, Saw; Pa, Win Pa

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Nineteenth International Conference On Computer Applications (ICCA 2021)
/
View Item

dc.contributor.author	Win, Saw
dc.contributor.author	Pa, Win Pa
dc.date.accessioned	2022-07-04T05:48:05Z
dc.date.available	2022-07-04T05:48:05Z
dc.date.issued	2021-02-25
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2702
dc.description.abstract	Myanmar language is a low-resource language as well as obtaining large-scale cleaned data for natural language processing(NLP) tasks, it is challenging and expensive with the progress in NLP. Deep learning has boosted the development of pre-trained language model has led to significant performance gains. Despite their popularity, the majority of available models have been either trained on English data or multi-language data concatenation. This makes very limited practical use of such models, in all languages except English. Currently, monolingual pre-trained language models based on Bidirectional Encoder Representations from Transformers (BERT) show that their performance outperforms multi-lingual models in many downstream NLP tasks, under same configurations. However, a large monolingual corpus and monolingual pre-trained language model for Myanmar language are not available publicly yet. In this paper, we introduce a large monolingual corpus called MyCorpus and also release Myanmar pre-trained language model(MyanmarBERT) based on BERT. Myanmar NLP tasks such as part-of-speech (POS) tagging and named- entity recognition (NER) have been used for evaluation on MyanmarBERT and Multilingual BERT(M-BERT). The comparative results over these two models are presented. MyanmarBERT will be useful for researchers working on the Myanmar NLP and pre-trained model is available at http://www.nlpresearch-ucsy.edu.mm/mybert.html.	en_US
dc.language.iso	en_US	en_US
dc.publisher	ICCA	en_US
dc.subject	BERT, Pre-trained Language Model, Named Entity Recognition, POS tagging, Myanmar Language	en_US
dc.title	MyanmarBERT:Myanmar Pre-trained Language Model using BERT	en_US
dc.type	Presentation	en_US