Abstract:
Complete manual annotation of dependency treebank needs resources like annotators and
annotation tools and takes long time and has high possibility of inconsistent annotations
for free word order languages such as Myanmar. This paper describes a dependency head
annotation scheme with Universal part-of-speech and Universal Dependencies for
Myanmar dependency treebank. Currently 22,810 sentences and 680,218 tokens were
annotated from three corpora for Myanmar dependency treebank. Some language specific
issues are also described with examples. Raw syntactic structures were annotated
automatically by UDPipe according to the Universal Dependencies based on Universalpart-of-speech tag scheme. Then unsupervised annotated dependency head structures have
been manually updated in post processing. To be reliable and speedy post process with
reduced errors for manual updating, selected sentences were added to the training data
after being updated. After that the model has been retrained and the remaining sentences
were parsed by UDPipe. Post processing was repeated until all sentences were updated.
Some specifications of dependency annotation schemes in sentences encountered in post
processing are presented with examples. For parsing performance of annotated data, cross
validation tests and parsing experiments were performed. Moreover, annotated treebank
data have also been evaluated by CoNLL 2017 evaluation script for parsing performance.
Results of parsing experiments and evaluation are also reported by unlabeled and labeled
attachment scores and demonstrated that the proposed method is a suitable way for
building Myanmar dependency trees. Moreover, syntax structures of treebank are also
analyzed and syntax information is also presented. This dependency head annotation for
dependency treebank is the first work for Myanmar language as far as we know.