UCSY's Research Repository

Sentiment Analysis System in Big Data Environment

Show simple item record

dc.contributor.author Chan, Wint Nyein
dc.date.accessioned 2019-09-23T05:18:47Z
dc.date.available 2019-09-23T05:18:47Z
dc.date.issued 2019-01
dc.identifier.uri http://onlineresource.ucsy.edu.mm/handle/123456789/2257
dc.description.abstract Nowadays, Big Data, a large volume of both structured and unstructured data, is generated from Social Media. Social Media are powerful marketing tools and Social Big Data can offer the business insights. The major challenge facing Social Big Data is attaining efficient techniques to collect a large volume of social data and extract insights from the huge amount of collected data. Sentiment Analysis of Social Big Data can provide business insights by extracting the public opinions. The traditional analytic platforms need to be scaled up for analyzing a large volume of Social Big Data. Social data are by nature shorter and generally not constructed with proper grammatical rules and hence difficult to achieve high reliable result in Sentiment Analysis. Acquiring effective training data is a challenge, although learning based approaches are good for sentiment classification. Manual Labeling for training data is time and labor consuming. Sentiment analysis based on multiclass classification scheme is oriented towards classification of text into more detailed sentiment labels. However, multiclass classification with Single-tier architecture where single model is developed and entire labeled data is trained may decrease the classification accuracy. The presence of sarcasm, an interfering factor that can flip the sentiment of the given text, is one of the challenges of Sentiment Analysis. Real-time tracking and analytics is important for Social Big Data because the speed may indeed be the most important competitive business profits. Compared to batch processing of Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data intensive in nature and require to efficiently collect and process large volume and high velocity of data. In this research, proposed Sentiment Analysis system is implemented with different architectures on different platforms to provide valuable information by analyzing large scale social data in an efficient and timely manner. Firstly, Sentiment Analysis is implemented on traditional analytics platform by performing model selection which is evaluated by comparing the performance of three different machine learning algorithm (Naïve Bayes, Random Forest and Linear Regression). For developing scalable and high performance Sentiment Analysis system, Sentiment Analysis is implemented on Big Data Analytics Platform (Hadoop MapReduce). The system enables high-level performance of sentiment classification while taking advantage of combining lexicon-based classifier’s effortless setup process andiii learning based classifier. Multi-tier Sentiment Analysis system on Big Data Analytics Platform (MSABDP) is developed for achieving high level performance of multiclass classification. This system is implemented by combining lexicon and learning based classification scheme with Multi-tier architecture. Multi-tier Sentiment Analysis system with sarcasm detection on Hadoop (MSASDH) is proposed to achieve high-level performance of sentiment classification. MSASDH identifies sarcasm and sentiment-emotion by conducting rule based sarcasm-sentiment detection scheme and sentiment classification with Multi-tier architecture. Real-time Multi-tier Sentiment Analysis system (RMSA) is implemented to achieve high level performance of multi-class classification in Real-time manner. To improve the classification accuracy, the suitable classifier is selected by comparing the accuracy of three different learning based multiclass classification techniques: Naïve Bayes, Linear SVC and Logistic Regression. On the traditional analytics platform, Naïve Bayes classifier is better and the proposed system can achieved the promising accuracy. The evaluation result shows that the proposed system on Big Data Analytics Platform has enabled to achieve the promising accuracy by 84.2% and is able to scale up to analyze the large scale data by decreasing the running time when adding more nodes in the cluster. The evaluation results show that the proposed MSABDP is able to significantly improve the classification accuracy over multi-class classification based on Single-tier architecture by 7%. The evaluation results show that detecting sarcasm can enhance the accuracy of Sentiment Analysis. The evaluation results show that Real-time Multi-tier Sentiment Analysis achieves the promising accuracy and Linear SVC is better than other techniques for Real-time Multi-tier Sentiment Analysis. en_US
dc.language.iso en_US en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.title Sentiment Analysis System in Big Data Environment en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


My Account