Abstract:
Big data and its analysis have become a widespread practice in recent times,
applicable to multiple industries. Data mining is a technique that is based on statistical
applications. It is the process of discovering hidden or unknown patterns in huge
datasets that are potentially useful and ultimately understandable. The goal of data
mining is to extract useful information from huge data sets and to store it as an
understandable and structured model for future use, using combined technique of
statistics, machine learning and database systems. Classification is a supervised
method, which is used to predict categorical class label of a given data instance so as
to classify it into a predetermined class. Decision tree is the simple and most
commonly used algorithm among the classification algorithms. This system analyses
the performance of CART and C5.0 algorithms based on training and testing phases
for two UCI datasets: car evaluation and credit card datasets using evaluation metrics
such as accuracy, processing time and decision rules. It is a two-step process, in the
first step, algorithm uses training data to build a classifier, and then in second step it
uses this classifier to estimate the class label of data instance. The classifier is like a
function that maps a data instance to a label. The system aims to compare the results
of both algorithms and discovers in which either one of them is significantly
outperforming the other.
This system implemented using C# programing language with Microsoft
Visual Studio 2013 and Microsoft SQL Server Management Studio 2012 platform to
build the database.