DATA SCRUTINY FOR NEURAL NETWORK-BASED  BURMESE SPEAKER IDENTIFICATION

Phyu, Win Lai Lai

dc.contributor.author	Phyu, Win Lai Lai
dc.date.accessioned	2024-03-29T03:21:12Z
dc.date.available	2024-03-29T03:21:12Z
dc.date.issued	2024-03
dc.identifier.uri	https://onlineresource.ucsy.edu.mm/handle/123456789/2799
dc.description.abstract	This dissertation aims to investigate the data augmenting and scrutinizing methods in developing a speech dataset for text independent Burmese speaker identification in open-set case which means the test speaker may not pre-modeled and included in the classifier. The training acoustic models are built based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) and Time Delay Neural Network (TDNN) model. The speech dataset for speaker identification is firstly constructed because there is no available speech dataset for speaker identification research in Burmese. The data are collecting from the two domains: the web-based news data and recorded daily conversations. By this dataset, state-of-the-art acoustic speaker models for Burmese speaker identification are constructed. Speaker identification is the task of analyzing the speakers’ characteristics in speech to exactly identify individuals. The identification task performs better when there is enough background training data. The sufficient amount of speech data collection is a very challenging task in a short time for building Burmese speaker identification system because Burmese language can be considered as an under resourced language due to its linguistic resource availability. For getting sufficient amount of background training data, MUSAN speech dataset is used as speech data augmenting. For high quality training data, many other scrutinized techniques are investigated. Among them, the two data scrutinizing methods: increasing the speech intensity in SNRs to 10 dB and downing the tempo factor 0.2 times without affecting the pitch of utterances are applied to the original speech dataset. Moreover, white noise-added dataset is also created from the original dataset in order to prove that any kinds of noise can cause trouble the identification performance. Mel Frequency Cepstral Coefficient (MFCC) is used to extract the speaker specific features as front end processing. In this work, TDNN and GMM-UBM based acoustic speaker models are constructed based on original, scrutinized and white noise-added training data. It can indicate that the impacts of speech data quality in constructing speaker models by using scrutinized training data and points out the important role of speaker models in identification process. The speakers’ identities are assessed with probabilistic linear discriminant analysis (PLDA) approach. The system performance is presented in the form of Equal Error Rate (EER) and detecting accuracy (Acc).	en_US
dc.language.iso	en	en_US
dc.publisher	University of Computer Studies, Yangon	en_US
dc.subject	DATA SCRUTINY	en_US
dc.subject	NEURAL NETWORK-BASED BURMESE SPEAKER IDENTIFICATION	en_US
dc.title	DATA SCRUTINY FOR NEURAL NETWORK-BASED BURMESE SPEAKER IDENTIFICATION	en_US
dc.type	Thesis	en_US