UCSY's Research Repository

An Efficient Predictive Big Data Analytics System For High Dimensional Data

Show simple item record

dc.contributor.author Oo, Myat Cho Mon
dc.date.accessioned 2021-11-16T15:20:07Z
dc.date.available 2021-11-16T15:20:07Z
dc.date.issued 2021-02
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2591
dc.description.abstract In the current big explosion era, data is increasing dramatically every year. Gaining critical business insights by querying and analyzing this vast amount of data is becoming a challenge for conventional data mining techniques. It is not fit for processing big data beyond the capabilities of traditional systems. Massive samples and features of big data create issues such as heavy computational cost and algorithmic instability because it brings the curse of dimensionality. Predictive analytics is the enabler of big data, using machine learning algorithms to extract useful knowledge from large amounts of data and makes more formidable efforts. Effective and reliable results of the predictive analytics system depend on the quality of the predictive model. This research aims to develop the efficient Predictive Big data Analytics, PBA system, for providing the valuable information and making a better business decision in an efficient and timely manner. To achieve this goal, PBA system with different architectures on big data analytics platforms is implemented by examining the bulk of big data. Firstly, scalability test is carried out by analyzing the performance of machine learning on traditional and big data analytics platform for reducing the generalization error and processing the massive data. The processing performance of analytics engines (MapReduce and Apache Spark) is conducted using a scalable machine learning algorithm and then Spark processing engine is selected to provide computationally efficient and relatively easy to implement the PBA system. For developing a scalable and high-performance PBA system, model selection is performed by evaluation the performance of four different machine learning algorithms (Random Forest, Gradient Boosting, Decision Trees and Linear Regression). The efficient PBA system is established based on the powerful machine learning technique, Scalable Random Forests (SRF). To get the prediction model with high accuracy, Hyperparameters Optimization in SRF is performed. In addition to mitigating data quality challenges, reducing the high dimensions of data improves operational efficiency by minimizing computational and storage costs. Real-time PBA system is developed to achieve high predictive powers in real-time manner. The different U.S stock data from eight companies are captured in real-time and predicts whether the stock prices will rise or fall relative to the price n days ago. In RPBA3 system, the features of stock datasets are considered as input feature variables based on the calculation of technical indicators for helping the investors to buy or sell the stocks. Experimental results indicated that the prediction accuracy of the proposed PBA system is better than the RF algorithm from Spark's scalable machine learning library. The important finding of this research is that the combination of SRF's hyperparameters optimization and dimensionality reduction technology can considerably improve the efficiency and effectiveness of the system in terms of accuracy and computational time. en_US
dc.language.iso en_US en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.title An Efficient Predictive Big Data Analytics System For High Dimensional Data en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


My Account