Abstract:
Over the years, social media sites have become
an important role of communication and a source of
the tremendous amount of real-time data of images,
videos, etc. YouTube is likely the most popular of them,
with millions of uploaded videos and billions of
comments for all these videos. This paper presents a
new Music dataset, which has the YouTube comments
written in Myanmar language, to be applied in
sentiment analysis. Data preprocessing is very
important for our language because it allows
improving the quality of the raw data and converting
the raw data into a clean dataset. The preprocessing
of music comments is followed by basic phase,
removing phase, segmentation phase, replacement
phase and translation phase. The outcome of YouTube
comment preprocessing will aid in better sentiment
analysis. Results show that the preprocessing
approaches give a significant effect on the musical
opinion extraction process using information gain.