UCSY's Research Repository

URL Classification Based on Lexical Features by Machine Learning

Show simple item record

dc.contributor.author Vung, Cing Gel
dc.contributor.author Win, Yu Yu
dc.date.accessioned 2022-07-05T04:01:30Z
dc.date.available 2022-07-05T04:01:30Z
dc.date.issued 2021-02-25
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2723
dc.description.abstract The malicious website becomes the hub sector in the cybercrime component of the internet. Attackers delivered malicious URLs to target users via links, emails, or advertisements. Many of the previous research has analyzed URL phishing detection with several approaches to reduce the risk. In this work, we have investigated the lexical structure of the URL as input for the classification models. The system has employed the Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Artificial Neural Network (ANN) as evaluators for detecting malicious URLs. The datasets are collected from the Phish Tank website to build the proposed system. The approach has adopted static lexical features with imbalanced dataset for safer and faster extraction. Evaluation of the classifiers achieved the accuracy of 88%, 87%, and 88% respectively. The detection rate is high, a false positive rate is 0.13%, and false negative rate is 0.07% in XGBoost. The results show that the imbalanced nature of phishing URL affects the detection system performance. en_US
dc.language.iso en_US en_US
dc.publisher ICCA en_US
dc.subject cybersecurity, feature extraction, machine learning, classification en_US
dc.title URL Classification Based on Lexical Features by Machine Learning en_US
dc.type Presentation en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


My Account