THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT

Ei, Yee Mon

dc.contributor.author	Ei, Yee Mon
dc.date.accessioned	2022-10-03T15:45:44Z
dc.date.available	2022-10-03T15:45:44Z
dc.date.issued	2022-09
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2750
dc.description.abstract	Nowadays, big data is widely used in healthcare for prediction of diseases. Breast cancer is the most occurred cancer disease in the world that occurs in a woman. If this disease is detected in early stages, there will be a better chance for curing. In this system, a scalable and fault tolerant pipeline model is proposed for analyzing big cancer data and predicting the cancerous cells. Nowadays, a large amount of digital data is generated from everywhere, every second of the day. One of the challenges is the volume of generated data with high dimensionality. Most of traditional machine learning algorithms are not good in training time and classification result to find hidden insights from these high dimensional data. This model is developed on Apache Spark Framework using Random Forest algorithm and the used data source is Wisconsin Diagnosis Breast Cancer Dataset of the University of California at Irvine (UCI) Machine Learning Repository. This system is implemented using Apache Spark-based Random Forest algorithm in order to compare with Naïve Bayes in terms of accuracy, precision, recall and f-measure. The analysis of evaluation results describes the achievement of the proposed system with the accuracy of 98.2% in the Big Data Analytics Environment. The proposed system is implemented by Scala programming language on Linux platform.	en_US
dc.language.iso	en	en_US
dc.subject	BIG DATA ENVIRONMENT	en_US
dc.title	THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT	en_US
dc.type	Thesis	en_US