Abstract:
Nowadays, social networking such as Facebook, YouTube, Telegram,
Instagram, etc. are very popular among people as IT technologies are developing
more and more. Facebook and YouTube are the most popular social media platforms
between young and old people, especially young people. The users can subscribe and
give their opinion as the comments on YouTube. This system is developed with
YouTube comment spam classification framework by using term frequency- inverse
document frequency (TF-IDF) and Multinomial Naïve Bayes. TF-IDF is a statistical
method to measure the weight or score of each word in each document to the whole
corpus. This system is implemented using ASP.Net programming language on
Microsoft Visual Studio 2015 IDE and Microsoft SQL Server 2017 Express Version
as Database Engine. In this system, 1965 comments typed for five music videos of
five singers (PSY, Katy Perry, LMFAO, Eminem, and Shakira) uploaded on YouTube
are collected as data set. The purpose of this system is to categorize the YouTube
comments into the suitable categories by using Multinomial Naïve Bayes Classifier
and classify the comments as spam or legitimate (ham) depending on the contents in
comment. Finally, the system evaluates the results with the accuracy (precision, recall
and F-measure).