Abstract:
Extracting valuable information from data is a challenging task. Many times, because of massive, redundant, incorrect and noisy outcomes, an analyst will end up with an incorrect classifier. It can also be due to misinterpreting the results and using incorrect procedures for a particular scenario. In our research, we discussed Naïve Bayes, one of the principal approaches to data mining. We did a comparative study of this approach as well. WEKA, which is open source software, was the instrument we used. The datasets used in the NSL-KDD dataset are KDDTrain+.arff and the datasets used in the DARPA dataset are DARPAWeek3-1.arff. ARFF is an abbreviation for Attribute Relation File Format, the simple dataset format that has been adopted by WEKA. The training set sample size is the amount of attributes that are present in the dataset and the amount of documents. Classification models shall be evaluated on the basis of the amount of class labels included in the dataset, the accuracy, the quantity and duration of the legislation created, the error rate and the standing status of the classification. The results show that, based on the amount of experiments we have performed, Naïve Bayes provides better accuracy with discretization than without discretization.