Faculty of Computer Science

Faculty of Computer Science https://onlineresource.ucsy.edu.mm/handle/123456789/694 2026-07-29T20:44:27Z 2026-07-29T20:44:27Z Dependency Head Annotation for Myanmar Dependency Treebank Aye, Hnin Thu Zar Pa, Win Pa https://onlineresource.ucsy.edu.mm/handle/123456789/2545 2020-12-30T10:13:44Z 2020-11-01T00:00:00Z

Dependency Head Annotation for Myanmar Dependency Treebank Aye, Hnin Thu Zar; Pa, Win Pa Complete manual annotation of dependency treebank needs resources like annotators and annotation tools and takes long time and has high possibility of inconsistent annotations for free word order languages such as Myanmar. This paper describes a dependency head annotation scheme with Universal part-of-speech and Universal Dependencies for Myanmar dependency treebank. Currently 22,810 sentences and 680,218 tokens were annotated from three corpora for Myanmar dependency treebank. Some language specific issues are also described with examples. Raw syntactic structures were annotated automatically by UDPipe according to the Universal Dependencies based on Universalpart-of-speech tag scheme. Then unsupervised annotated dependency head structures have been manually updated in post processing. To be reliable and speedy post process with reduced errors for manual updating, selected sentences were added to the training data after being updated. After that the model has been retrained and the remaining sentences were parsed by UDPipe. Post processing was repeated until all sentences were updated. Some specifications of dependency annotation schemes in sentences encountered in post processing are presented with examples. For parsing performance of annotated data, cross validation tests and parsing experiments were performed. Moreover, annotated treebank data have also been evaluated by CoNLL 2017 evaluation script for parsing performance. Results of parsing experiments and evaluation are also reported by unlabeled and labeled attachment scores and demonstrated that the proposed method is a suitable way for building Myanmar dependency trees. Moreover, syntax structures of treebank are also analyzed and syntax information is also presented. This dependency head annotation for dependency treebank is the first work for Myanmar language as far as we know.

2020-11-01T00:00:00Z Towards Burmese (Myanmar) Morphological Analysis: Syllable-based Tokenization and Part-of-speech Tagging Ding, Chen Chen Aye, Hnin Thu Zar Pa, Win Pa Nwet, Khin Thandar Soe, Khin Mar Utiyama, Masao Sumita, Eiichiro https://onlineresource.ucsy.edu.mm/handle/123456789/2544 2020-12-30T10:13:44Z 2019-06-01T00:00:00Z

Towards Burmese (Myanmar) Morphological Analysis: Syllable-based Tokenization and Part-of-speech Tagging Ding, Chen Chen; Aye, Hnin Thu Zar; Pa, Win Pa; Nwet, Khin Thandar; Soe, Khin Mar; Utiyama, Masao; Sumita, Eiichiro This article presents a comprehensive study on two primary tasks in Burmese (Myanmar) morphological analysis: tokenization and part-of-speech (POS) tagging. Twenty thousand Burmese sentences of newswire are annotated with two-layer tokenization and POS-tagging information, as one component of the Asian Language Treebank Project. The annotated corpus has been released under a CC BY-NC-SA license, and it is the largest open-access database of annotated Burmese when this manuscript was prepared in 2017. Detailed descriptions of the preparation, refinement, and features of the annotated corpus are provided in the first half of the article. Facilitated by the annotated corpus, experiment-based investigations are presented in the second half of the article, wherein the standard sequence-labeling approach of conditional random fields and a long short-term memory (LSTM)-based recurrent neural network (RNN) are applied and discussed. We obtained several general conclusions, covering the effect of joint tokenization and POS-tagging and importance of ensemble from the viewpoint of stabilizing the performance of LSTM-based RNN. This study provides a solid basis for further studies on Burmese processing.

2019-06-01T00:00:00Z Unsupervised Dependency Corpus Annotation for Myanmar Language Aye, Hnin Thu Zar Pa, Win Pa Thu, Ye Kyaw https://onlineresource.ucsy.edu.mm/handle/123456789/2543 2020-12-30T10:13:44Z 2018-01-01T00:00:00Z

Unsupervised Dependency Corpus Annotation for Myanmar Language Aye, Hnin Thu Zar; Pa, Win Pa; Thu, Ye Kyaw Dependency parsing can provide the connection of linguistic unit (words) by a directed links. This paper presents annotating a general domain corpus by using unsupervised approach by applying Universal part-of-speech (U-POS) to build Treebank for unsupervised dependency parsing of Myanmar Language. Up to now it is still hard task to obtain complete syntactic structures for Myanmar Language. Dependency structures of words in Myanmar sentences are also presented of general words and phrases orders and the relations of basic sentence structures. To annotate by using U-POS, UDPipe is used. Moreover, the preliminary results of annotated trees and parsing experiment are presented. Parsing experiments are evaluated by UDPipe in terms of unlabeled and labeled attachment scores: (UAS) and (LAS), which are 93.20%, and 91.21% in test experiment respectively.

2018-01-01T00:00:00Z Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar language Cing, Dim Lam Soe, Khin Mar https://onlineresource.ucsy.edu.mm/handle/123456789/2531 2020-12-30T10:13:44Z 2020-04-01T00:00:00Z

Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar language Cing, Dim Lam; Soe, Khin Mar In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.

2020-04-01T00:00:00Z