Abstract:
Audio identification techniques for unknown songs in today music industry
are very popular for their auto detection ability to small pieces of audio signals. The
research methodologies for audio identification systems vary based on the acoustics
features extraction methods such as Mel Frequency Cepstral Coefficients (MFCC),
Bark scale acoustics features, Filter Bank Energy (FBE), etc.
Extracted features are represented as a compact and small form of audio, in
cases well known as audio fingerprints. Audio fingerprint extraction is the main
technique for audio identification system which is used by large international music
companies such as Gracenote, Pandora, Apple music. One of the main features of
audio fingerprinting is the detection of full songs by small pieces of audio which only
need to take between 3 seconds to 10 seconds according to granularity and robustness
ratio.
As the digital age is changing to streaming style instead of buying songs by
one from online distribution platforms, digital streaming companies like YouTube has
been facing issues to make sure rules and regulations for benefit sharing to contents
owners. After changing the music distribution style from CD selling into streaming in
digital platforms, the authors and content creators have more chances to get benefits
from their own contents so-called property.
Unfortunately, our country Myanmar is still in progress to make precise laws
and regulations to protect artists and other content owners from those who copy the
contents illegally. Myanmar is changing its music distribution style from CD selling
to online music platforms since 2011, in this year, illegal copyright infringement
cases were committed.
Founder of Legacy Music Network Company Limited, Dr. U Ko Ko Lwin
said that the distribution market is breaking down to these violations beyond ethics,
and so the concerned artists get unfair benefits. Almost all of the music industry in
Myanmar has changed into online music distribution style after 2015.
FM broadcasting is one of the big businesses in Myanmar. Various songs are
broadcast daily including old and classic songs. After the CD distribution market is
changed, the audiences are more interested in streaming music and videos. For the
iv
audience who wants to know which songs he or she listens to is the technical
challenge in audio fingerprint extraction. Therefore, the audio identification system
which is used by audio fingerprinting extraction methods is needed to automatically
detect songs and their related contents from broadcasting FM audios. Moreover, the
Myanmar music industry urgently needs an efficient broadcast monitoring system to
solve copyright infringement issues and illegal benefit-sharing between artists and
broadcasting stations.
In this thesis, a broadcast monitoring system is proposed for Myanmar FM
radio stations by utilizing space-saving audio fingerprint extraction based on the Mel
Frequency Cepstral Coefficient (MFCC). This study focused on reducing the memory
requirement for fingerprint storage while preserving the robustness of the audio
fingerprints to common distortions such as compression, noise addition, etc. In this
system, a 3-second audio clip is represented by a 2,712-bit fingerprint block. This
significantly reduces the memory requirement when compared to Philips Robust
Hashing (PRH), one of the dominant audio fingerprinting methods, where a 3-second
audio clip is represented by an 8,192-bit fingerprint block. The proposed system is
easy to implement and achieves correct and speedy music identification even on noisy
and distorted broadcast audio streams. In this research work, we deployed an audio
fingerprint database of 7,094 songs and broadcast audio streams of four local FM
channels in Myanmar to evaluate the performance of the proposed system. The
experimental results showed that the system achieved reliable performance.