UCSY's Research Repository

Font Script Identification Based on N-gram Text Categorization

Show simple item record

dc.contributor.author Than, Kyaw Myo
dc.contributor.author Htay, Hla Hla
dc.date.accessioned 2019-07-12T05:46:11Z
dc.date.available 2019-07-12T05:46:11Z
dc.date.issued 2010-12-16
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/865
dc.description.abstract In this paper, we propose a method for identifying font scripts of Myanmar Language. Because of the unavailability of nationwide standardized encoding scheme in Myanmar font scripts, knowledge written in Myanmar language are scattered across internet pages. Font scripts Identifier are essential to merge those scattered knowledge into one for NLP application such as text categorization, information retrieval and text summarization. Our proposed method use N-gram based text categorization. A piece of text for 11 font scripts is taken for training. TF-IDF (Term Frequency-Inverse Document Frequency) weights of character N-grams for each font script are computed and stored as a profile for that particular font script. When a new text document is given to testify, TF-IDF weight is computed for that font script and cosine similarity is measured between the test and trained profiles. The highest similarity scored of the font script is taken as a result. 100% accuracy is obtained for testing of 11different font scripts by applying TF-IDF approach. Therefore, this method works well for Myanmar font script identification. en_US
dc.language.iso en en_US
dc.publisher Fifth Local Conference on Parallel and Soft Computing en_US
dc.subject Font en_US
dc.subject Font Script en_US
dc.subject Language Identification en_US
dc.subject Font Script Identification en_US
dc.subject N-gram en_US
dc.subject Text Categorization en_US
dc.subject TF-IDF Weights en_US
dc.title Font Script Identification Based on N-gram Text Categorization en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics