Faculty of Computer Science Collection
https://onlineresource.ucsy.edu.mm/handle/123456789/696
2024-03-29T14:08:32ZDependency Head Annotation for Myanmar Dependency Treebank
https://onlineresource.ucsy.edu.mm/handle/123456789/2545
Dependency Head Annotation for Myanmar Dependency Treebank
Aye, Hnin Thu Zar; Pa, Win Pa
Complete manual annotation of dependency treebank needs resources like annotators and
annotation tools and takes long time and has high possibility of inconsistent annotations
for free word order languages such as Myanmar. This paper describes a dependency head
annotation scheme with Universal part-of-speech and Universal Dependencies for
Myanmar dependency treebank. Currently 22,810 sentences and 680,218 tokens were
annotated from three corpora for Myanmar dependency treebank. Some language specific
issues are also described with examples. Raw syntactic structures were annotated
automatically by UDPipe according to the Universal Dependencies based on Universalpart-of-speech tag scheme. Then unsupervised annotated dependency head structures have
been manually updated in post processing. To be reliable and speedy post process with
reduced errors for manual updating, selected sentences were added to the training data
after being updated. After that the model has been retrained and the remaining sentences
were parsed by UDPipe. Post processing was repeated until all sentences were updated.
Some specifications of dependency annotation schemes in sentences encountered in post
processing are presented with examples. For parsing performance of annotated data, cross
validation tests and parsing experiments were performed. Moreover, annotated treebank
data have also been evaluated by CoNLL 2017 evaluation script for parsing performance.
Results of parsing experiments and evaluation are also reported by unlabeled and labeled
attachment scores and demonstrated that the proposed method is a suitable way for
building Myanmar dependency trees. Moreover, syntax structures of treebank are also
analyzed and syntax information is also presented. This dependency head annotation for
dependency treebank is the first work for Myanmar language as far as we know.
2020-11-01T00:00:00ZTowards Burmese (Myanmar) Morphological Analysis: Syllable-based Tokenization and Part-of-speech Tagging
https://onlineresource.ucsy.edu.mm/handle/123456789/2544
Towards Burmese (Myanmar) Morphological Analysis: Syllable-based Tokenization and Part-of-speech Tagging
Ding, Chen Chen; Aye, Hnin Thu Zar; Pa, Win Pa; Nwet, Khin Thandar; Soe, Khin Mar; Utiyama, Masao; Sumita, Eiichiro
This article presents a comprehensive study on two primary tasks in Burmese (Myanmar) morphological
analysis: tokenization and part-of-speech (POS) tagging. Twenty thousand Burmese sentences of newswire
are annotated with two-layer tokenization and POS-tagging information, as one component of the Asian
Language Treebank Project. The annotated corpus has been released under a CC BY-NC-SA license, and it is
the largest open-access database of annotated Burmese when this manuscript was prepared in 2017. Detailed
descriptions of the preparation, refinement, and features of the annotated corpus are provided in the first half
of the article. Facilitated by the annotated corpus, experiment-based investigations are presented in the second
half of the article, wherein the standard sequence-labeling approach of conditional random fields and a long
short-term memory (LSTM)-based recurrent neural network (RNN) are applied and discussed. We obtained
several general conclusions, covering the effect of joint tokenization and POS-tagging and importance of
ensemble from the viewpoint of stabilizing the performance of LSTM-based RNN. This study provides a solid
basis for further studies on Burmese processing.
2019-06-01T00:00:00ZUnsupervised Dependency Corpus Annotation for Myanmar Language
https://onlineresource.ucsy.edu.mm/handle/123456789/2543
Unsupervised Dependency Corpus Annotation for Myanmar Language
Aye, Hnin Thu Zar; Pa, Win Pa; Thu, Ye Kyaw
Dependency parsing can provide the connection of linguistic
unit (words) by a directed links. This paper presents annotating a general domain corpus by using unsupervised approach by applying Universal part-of-speech (U-POS) to build
Treebank for unsupervised dependency parsing of Myanmar
Language. Up to now it is still hard task to obtain complete
syntactic structures for Myanmar Language. Dependency
structures of words in Myanmar sentences are also presented
of general words and phrases orders and the relations of basic sentence structures. To annotate by using U-POS, UDPipe
is used. Moreover, the preliminary results of annotated trees
and parsing experiment are presented. Parsing experiments
are evaluated by UDPipe in terms of unlabeled and labeled
attachment scores: (UAS) and (LAS), which are 93.20%,
and 91.21% in test experiment respectively.
2018-01-01T00:00:00ZImproving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar language
https://onlineresource.ucsy.edu.mm/handle/123456789/2531
Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar language
Cing, Dim Lam; Soe, Khin Mar
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also
necessary in NLP’s preprocessing work applications such as machine
translation (MT), information retrieval (IR), etc. Currently, there are many
research efforts in word segmentation and POS tagging developed separately
with different methods to get high performance and accuracy. For Myanmar
Language, there are also separate word segmentors and POS taggers based on
statistical approaches such as Neural Network (NN) and Hidden Markov
Models (HMMs). But, as the Myanmar language's complex morphological
structure, the OOV problem still exists. To keep away from error and improve
segmentation by utilizing POS data, segmentation and labeling should be
possible at the same time.The main goal of developing POS tagger for any
Language is to improve accuracy of tagging and remove ambiguity in
sentences due to language structure. This paper focuses on developing word
segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This
paper presented the comparison of separate word segmentation and POS
tagging with joint word segmentation and POS tagging.
2020-04-01T00:00:00Z