Abstract:
Nowadays, Big Data, a large volume of both structured and unstructured data,
is generated from Social Media. Social Media are powerful marketing tools and
Social Big Data can offer the business insights. The major challenge facing Social Big
Data is attaining efficient techniques to collect a large volume of social data and
extract insights from the huge amount of collected data. Sentiment Analysis of Social
Big Data can provide business insights by extracting the public opinions. The
traditional analytic platforms need to be scaled up for analyzing a large volume of
Social Big Data. Social data are by nature shorter and generally not constructed with
proper grammatical rules and hence difficult to achieve high reliable result in
Sentiment Analysis. Acquiring effective training data is a challenge, although learning
based approaches are good for sentiment classification. Manual Labeling for training
data is time and labor consuming. Sentiment analysis based on multiclass
classification scheme is oriented towards classification of text into more detailed
sentiment labels. However, multiclass classification with Single-tier architecture
where single model is developed and entire labeled data is trained may decrease the
classification accuracy. The presence of sarcasm, an interfering factor that can flip the
sentiment of the given text, is one of the challenges of Sentiment Analysis. Real-time
tracking and analytics is important for Social Big Data because the speed may indeed
be the most important competitive business profits. Compared to batch processing of
Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data
intensive in nature and require to efficiently collect and process large volume and
high velocity of data.
In this research, proposed Sentiment Analysis system is implemented with
different architectures on different platforms to provide valuable information by
analyzing large scale social data in an efficient and timely manner. Firstly, Sentiment
Analysis is implemented on traditional analytics platform by performing model
selection which is evaluated by comparing the performance of three different machine
learning algorithm (Naïve Bayes, Random Forest and Linear Regression). For
developing scalable and high performance Sentiment Analysis system, Sentiment
Analysis is implemented on Big Data Analytics Platform (Hadoop MapReduce). The
system enables high-level performance of sentiment classification while taking
advantage of combining lexicon-based classifier’s effortless setup process andiii
learning based classifier. Multi-tier Sentiment Analysis system on Big Data
Analytics Platform (MSABDP) is developed for achieving high level performance of
multiclass classification. This system is implemented by combining lexicon and
learning based classification scheme with Multi-tier architecture. Multi-tier Sentiment
Analysis system with sarcasm detection on Hadoop (MSASDH) is proposed to
achieve high-level performance of sentiment classification. MSASDH identifies
sarcasm and sentiment-emotion by conducting rule based sarcasm-sentiment detection
scheme and sentiment classification with Multi-tier architecture. Real-time Multi-tier
Sentiment Analysis system (RMSA) is implemented to achieve high level
performance of multi-class classification in Real-time manner. To improve the
classification accuracy, the suitable classifier is selected by comparing the accuracy of
three different learning based multiclass classification techniques: Naïve Bayes,
Linear SVC and Logistic Regression.
On the traditional analytics platform, Naïve Bayes classifier is better and the
proposed system can achieved the promising accuracy. The evaluation result shows
that the proposed system on Big Data Analytics Platform has enabled to achieve the
promising accuracy by 84.2% and is able to scale up to analyze the large scale data by
decreasing the running time when adding more nodes in the cluster. The evaluation
results show that the proposed MSABDP is able to significantly improve the
classification accuracy over multi-class classification based on Single-tier architecture
by 7%. The evaluation results show that detecting sarcasm can enhance the accuracy
of Sentiment Analysis. The evaluation results show that Real-time Multi-tier
Sentiment Analysis achieves the promising accuracy and Linear SVC is better than
other techniques for Real-time Multi-tier Sentiment Analysis.