Abstract:
Data mining is the process of analyzing data from different perspectives and
summarizing it into useful information. Classification is a data mining technique
which addresses the problem of constructing a predictive model for a class attribute
given the values of other attributes and some examples of records with known class.
Decision tree is one of the most well-established classification methods.
This thesis presents a weighted C4.5 decision tree algorithm for breast cancer
classification and compared with the classification results of traditional C4.5
algorithm. The weighted C4.5 algorithm is set to appropriate weights of preparation
instances grounded on naïve Bayesian theorem before trying to construct a decision
tree model. The aim of the proposed system is to examine the performance of
weighted C4.5 decision tree algorithms. According to the experimental results, the
accuracy of weighted C4.5 is 99.56% and traditional C4.5 is 94.27%. Therefore, the
weighted C4.5 algorithm is better than traditional C4.5 algorithm on breast cancer
dataset. This system is implemented by using Java language.