dc.contributor.author |
Win, Saw |
|
dc.contributor.author |
Pa, Win Pa |
|
dc.date.accessioned |
2022-07-04T05:48:05Z |
|
dc.date.available |
2022-07-04T05:48:05Z |
|
dc.date.issued |
2021-02-25 |
|
dc.identifier.uri |
https://onlineresource.ucsy.edu.mm/handle/123456789/2702 |
|
dc.description.abstract |
Myanmar language is a low-resource language as well as obtaining large-scale cleaned data for natural language processing(NLP) tasks, it is challenging and expensive with the progress in NLP. Deep learning has boosted the development of pre-trained language model has led to significant performance gains. Despite their popularity, the majority of available models have been either trained on English data or multi-language data concatenation. This makes very limited practical use of such models, in all languages except English. Currently, monolingual pre-trained language models based on Bidirectional Encoder Representations from Transformers (BERT) show that their performance outperforms multi-lingual models in many downstream NLP tasks, under same configurations. However, a large monolingual corpus and monolingual pre-trained language model for Myanmar language are not available publicly yet. In this paper, we introduce a large monolingual corpus called MyCorpus and also release Myanmar pre-trained language model(MyanmarBERT) based on BERT. Myanmar NLP tasks such as part-of-speech (POS) tagging and named- entity recognition (NER) have been used for evaluation on MyanmarBERT and Multilingual BERT(M-BERT). The comparative results over these two models are presented. MyanmarBERT will be useful for researchers working on the Myanmar NLP and pre-trained model is available at http://www.nlpresearch-ucsy.edu.mm/mybert.html. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
ICCA |
en_US |
dc.subject |
BERT, Pre-trained Language Model, Named Entity Recognition, POS tagging, Myanmar Language |
en_US |
dc.title |
MyanmarBERT:Myanmar Pre-trained Language Model using BERT |
en_US |
dc.type |
Presentation |
en_US |