UCSY's Research Repository

DATA SCRUTINY FOR NEURAL NETWORK-BASED BURMESE SPEAKER IDENTIFICATION

Show simple item record

dc.contributor.author Phyu, Win Lai Lai
dc.date.accessioned 2024-03-29T03:21:12Z
dc.date.available 2024-03-29T03:21:12Z
dc.date.issued 2024-03
dc.identifier.uri https://onlineresource.ucsy.edu.mm/handle/123456789/2799
dc.description.abstract This dissertation aims to investigate the data augmenting and scrutinizing methods in developing a speech dataset for text independent Burmese speaker identification in open-set case which means the test speaker may not pre-modeled and included in the classifier. The training acoustic models are built based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) and Time Delay Neural Network (TDNN) model. The speech dataset for speaker identification is firstly constructed because there is no available speech dataset for speaker identification research in Burmese. The data are collecting from the two domains: the web-based news data and recorded daily conversations. By this dataset, state-of-the-art acoustic speaker models for Burmese speaker identification are constructed. Speaker identification is the task of analyzing the speakers’ characteristics in speech to exactly identify individuals. The identification task performs better when there is enough background training data. The sufficient amount of speech data collection is a very challenging task in a short time for building Burmese speaker identification system because Burmese language can be considered as an under resourced language due to its linguistic resource availability. For getting sufficient amount of background training data, MUSAN speech dataset is used as speech data augmenting. For high quality training data, many other scrutinized techniques are investigated. Among them, the two data scrutinizing methods: increasing the speech intensity in SNRs to 10 dB and downing the tempo factor 0.2 times without affecting the pitch of utterances are applied to the original speech dataset. Moreover, white noise-added dataset is also created from the original dataset in order to prove that any kinds of noise can cause trouble the identification performance. Mel Frequency Cepstral Coefficient (MFCC) is used to extract the speaker specific features as front end processing. In this work, TDNN and GMM-UBM based acoustic speaker models are constructed based on original, scrutinized and white noise-added training data. It can indicate that the impacts of speech data quality in constructing speaker models by using scrutinized training data and points out the important role of speaker models in identification process. The speakers’ identities are assessed with probabilistic linear discriminant analysis (PLDA) approach. The system performance is presented in the form of Equal Error Rate (EER) and detecting accuracy (Acc). en_US
dc.language.iso en en_US
dc.publisher University of Computer Studies, Yangon en_US
dc.subject DATA SCRUTINY en_US
dc.subject NEURAL NETWORK-BASED BURMESE SPEAKER IDENTIFICATION en_US
dc.title DATA SCRUTINY FOR NEURAL NETWORK-BASED BURMESE SPEAKER IDENTIFICATION en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository



Browse

My Account

Statistics