Comparison of Data Mining Classification Algorithms: C5.0 and CART for Car Evaluation and Credit Card Information Datasets

Maung, Ei Thinzar Win

Comparison of Data Mining Classification Algorithms: C5.0 and CART for Car Evaluation and Credit Card Information Datasets

Maung, Ei Thinzar Win

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2476

Date: 2020-01

Abstract:

Big data and its analysis have become a widespread practice in recent times, applicable to multiple industries. Data mining is a technique that is based on statistical applications. It is the process of discovering hidden or unknown patterns in huge datasets that are potentially useful and ultimately understandable. The goal of data mining is to extract useful information from huge data sets and to store it as an understandable and structured model for future use, using combined technique of statistics, machine learning and database systems. Classification is a supervised method, which is used to predict categorical class label of a given data instance so as to classify it into a predetermined class. Decision tree is the simple and most commonly used algorithm among the classification algorithms. This system analyses the performance of CART and C5.0 algorithms based on training and testing phases for two UCI datasets: car evaluation and credit card datasets using evaluation metrics such as accuracy, processing time and decision rules. It is a two-step process, in the first step, algorithm uses training data to build a classifier, and then in second step it uses this classifier to estimate the class label of data instance. The classifier is like a function that maps a data instance to a label. The system aims to compare the results of both algorithms and discovers in which either one of them is significantly outperforming the other. This system implemented using C# programing language with Microsoft Visual Studio 2013 and Microsoft SQL Server Management Studio 2012 platform to build the database.

Show full item record