Abstract:
As the development of internet technology is raising, the volume of information
used for the internet users also increase in the web. Users can apply that information
and give opinions for decision making system. Sentiment analysis also known as
opinion mining is a task of text categorization methods that take opinion presented in a
piece of text. An active research area is the sentiment analysis of text documents. The
essential text resources found on social media, such as reviews, comments, tweets,
posts, opinions, and articles, are available in a variety of languages. These could be
analyzed to learn more about people's attitudes, beliefs, and feelings concerning various
topics and products. With a focus on the Asian Language Treebank, news from Ministry
of Information website(www.moi.gov.mm), and comments from social media webpage
(www.facebook.myanmarcelebrity.com.mm), this paper aims to target news and
comment of sentiment analysis in Myanmar social media. In order to categorize the
sentiment polarity of each social media comment into "positive," "negative," or
"neutral,”, automated analyzer methods were proposed in this paper.
This system constructs corpus for news comments for Myanmar language. The
datasets were then split into training and testing datasets, with the training dataset being
randomly split in a non-overfitting way using the cross-validation approach. In order to
improve the performance of the classifier, the case of imbalanced datasets was then
considered. The hyperparameters were modified to improve the performance and
outcomes of the classification. In addition, a number of information visualization
techniques were used to display the results, indicate how effectively the classifiers
performed, and highlight the key terms that had an impact on the classification process.
Feature weighting and selection are required in sentiment analysis to get more
efficiency. The proposed system implements sentiment analysis system for Myanmar
News and comments. TF-IDF and N-gram are used for feature weighting and
extraction. Support vector machine (SVM) is a supervised learning methods that
analyze data and recognize the patterns that are used for classification. Hyperparameter
optimization is used to find the set of specific model configuration arguments that does
in the best performance of the model. Random search is an algorithm in which random combinations of hyperparameters are chosen and applied to train a model. The best
random hyperparameter combinations are choosed. This system improves the Myanmar
news sentiment analysis system using SVM with Random search optimization. This
system also studies the machine learning algorithms for Myanmar sentiment analysis
system. This system showed that the comparison results of Naïve Bayes, Linear SVC,
and Linear SVC with random search optimization. Linear SVC with
RandomizesearchCV has the highest performance.
This system shows the most significant terms that had an impact on the
classification process as well as the classifiers' performance. The results were then
presented, along with ideas for how to optimize them in the further and information on
how well the suggested systems worked.