THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT

Ei, Yee Mon

THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT

Ei, Yee Mon

URI: https://onlineresource.ucsy.edu.mm/handle/123456789/2750

Date: 2022-09

Abstract:

Nowadays, big data is widely used in healthcare for prediction of diseases. Breast cancer is the most occurred cancer disease in the world that occurs in a woman. If this disease is detected in early stages, there will be a better chance for curing. In this system, a scalable and fault tolerant pipeline model is proposed for analyzing big cancer data and predicting the cancerous cells. Nowadays, a large amount of digital data is generated from everywhere, every second of the day. One of the challenges is the volume of generated data with high dimensionality. Most of traditional machine learning algorithms are not good in training time and classification result to find hidden insights from these high dimensional data. This model is developed on Apache Spark Framework using Random Forest algorithm and the used data source is Wisconsin Diagnosis Breast Cancer Dataset of the University of California at Irvine (UCI) Machine Learning Repository. This system is implemented using Apache Spark-based Random Forest algorithm in order to compare with Naïve Bayes in terms of accuracy, precision, recall and f-measure. The analysis of evaluation results describes the achievement of the proposed system with the accuracy of 98.2% in the Big Data Analytics Environment. The proposed system is implemented by Scala programming language on Linux platform.

Show full item record