UCSY's Research Repository

THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT

Show simple item record

dc.contributor.author Ei, Yee Mon
dc.date.accessioned 2022-10-03T15:45:44Z
dc.date.available 2022-10-03T15:45:44Z
dc.date.issued 2022-09
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2750
dc.description.abstract Nowadays, big data is widely used in healthcare for prediction of diseases. Breast cancer is the most occurred cancer disease in the world that occurs in a woman. If this disease is detected in early stages, there will be a better chance for curing. In this system, a scalable and fault tolerant pipeline model is proposed for analyzing big cancer data and predicting the cancerous cells. Nowadays, a large amount of digital data is generated from everywhere, every second of the day. One of the challenges is the volume of generated data with high dimensionality. Most of traditional machine learning algorithms are not good in training time and classification result to find hidden insights from these high dimensional data. This model is developed on Apache Spark Framework using Random Forest algorithm and the used data source is Wisconsin Diagnosis Breast Cancer Dataset of the University of California at Irvine (UCI) Machine Learning Repository. This system is implemented using Apache Spark-based Random Forest algorithm in order to compare with Naïve Bayes in terms of accuracy, precision, recall and f-measure. The analysis of evaluation results describes the achievement of the proposed system with the accuracy of 98.2% in the Big Data Analytics Environment. The proposed system is implemented by Scala programming language on Linux platform. en_US
dc.language.iso en en_US
dc.subject BIG DATA ENVIRONMENT en_US
dc.title THE ANALYSIS ON THE POTENTIAL BREAST CANCER BY USING BIG DATA ENVIRONMENT en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics