Effective Malicious Features Extraction and Classification for Incident Handling Systems

San, Cho Cho

Effective Malicious Features Extraction and Classification for Incident Handling Systems

San, Cho Cho

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/2376

Date: 2019-10

Abstract:

Each and every day, malicious software writers continue to create new variants, new innovation, new infection, and more obfuscated malware by using packing and encrypting techniques. Malicious software classification and detection play an important role and a big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. This research provides the classification system to differentiate malware from benign and classify malicious types. This research contributes the Malicious Sample Names Extraction (MSNE) procedure and Naming Malicious Samples using the Regular Expression (NMS_RE) technique have been contributed to label the malicious samples. This research also contributes the prominent Malware Feature Extraction Algorithm (MFEA) to point out the dominant features based on the generated report files. The features are API, DLL, and PROCESS called by malicious and benign executables through automated analysis. During the experiments, data cleansing for extracted raw data, applying the n-gram technique, and representing and preparing the malicious dataset have been performed to provide the malware classification system. This research work makes use of two malicious datasets for malware classification. The Benign Malware Classification (BMC) dataset is used for binary class classification system to identify malicious or not and Benign Malware Family Classification (BMFC) dataset is used for multi-class classification system to identify malware family. Chi-Square and Principal Component Analysis (PCA) feature selection methods have been applied in this system to select the best features. Classification algorithms like k-Nearest Neighbor (kNN), Random Forest (RF) and Support Vector Classification (SVC) have been used for multi-class and binary class classification. The proposed approach is able to classify the malicious and benign executable files effectively. This research work provides malware classification using Machine Learning (ML) classifiers. The findings from the experiment prove that the extracted API_DLL features provide the best evaluation metrics in terms of accuracy, confusion matrix (CM), True Positive Rate (TPR), False Positive Rate (FPR), and Receiver Operating Characteristic (ROC) curve area.

Show full item record