URL Classification Based on Lexical Features by Machine Learning

Vung, Cing Gel; Win, Yu Yu

UCSYRR Home
/
Conferences
/
International Conference on Computer Applications (ICCA)
/
Nineteenth International Conference On Computer Applications (ICCA 2021)
/
View Item

dc.contributor.author	Vung, Cing Gel
dc.contributor.author	Win, Yu Yu
dc.date.accessioned	2022-07-05T04:01:30Z
dc.date.available	2022-07-05T04:01:30Z
dc.date.issued	2021-02-25
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2723
dc.description.abstract	The malicious website becomes the hub sector in the cybercrime component of the internet. Attackers delivered malicious URLs to target users via links, emails, or advertisements. Many of the previous research has analyzed URL phishing detection with several approaches to reduce the risk. In this work, we have investigated the lexical structure of the URL as input for the classification models. The system has employed the Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Artificial Neural Network (ANN) as evaluators for detecting malicious URLs. The datasets are collected from the Phish Tank website to build the proposed system. The approach has adopted static lexical features with imbalanced dataset for safer and faster extraction. Evaluation of the classifiers achieved the accuracy of 88%, 87%, and 88% respectively. The detection rate is high, a false positive rate is 0.13%, and false negative rate is 0.07% in XGBoost. The results show that the imbalanced nature of phishing URL affects the detection system performance.	en_US
dc.language.iso	en_US	en_US
dc.publisher	ICCA	en_US
dc.subject	cybersecurity, feature extraction, machine learning, classification	en_US
dc.title	URL Classification Based on Lexical Features by Machine Learning	en_US
dc.type	Presentation	en_US