| dc.description.abstract | This dissertation aims to investigate the data augmenting and scrutinizing 
methods in developing a speech dataset for text independent Burmese speaker 
identification in open-set case which means the test speaker may not pre-modeled and 
included in the classifier. The training acoustic models are built based on Gaussian 
Mixture Model-Universal Background Model (GMM-UBM) and Time Delay Neural 
Network (TDNN) model. The speech dataset for speaker identification is firstly 
constructed because there is no available speech dataset for speaker identification 
research in Burmese. The data are collecting from the two domains: the web-based 
news data and recorded daily conversations. By this dataset, state-of-the-art acoustic 
speaker models for Burmese speaker identification are constructed. 
Speaker identification is the task of analyzing the speakers’ characteristics in 
speech to exactly identify individuals. The identification task performs better when 
there is enough background training data. The sufficient amount of speech data 
collection is a very challenging task in a short time for building Burmese speaker 
identification system because Burmese language can be considered as an under 
resourced language due to its linguistic resource availability. For getting sufficient 
amount of background training data, MUSAN speech dataset is used as speech data 
augmenting. For high quality training data, many other scrutinized techniques are 
investigated. Among them, the two data scrutinizing methods: increasing the speech 
intensity in SNRs to 10 dB and downing the tempo factor 0.2 times without affecting 
the pitch of utterances are applied to the original speech dataset. Moreover, white 
noise-added dataset is also created from the original dataset in order to prove that any 
kinds of noise can cause trouble the identification performance. Mel Frequency 
Cepstral Coefficient (MFCC) is used to extract the speaker specific features as front
end processing. In this work, TDNN and GMM-UBM based acoustic speaker models 
are constructed based on original, scrutinized and white noise-added training data. It 
can indicate that the impacts of speech data quality in constructing speaker models by 
using scrutinized training data and points out the important role of speaker models in 
identification process. The speakers’ identities are assessed with probabilistic linear 
discriminant analysis (PLDA) approach. The system performance is presented in the 
form of Equal Error Rate (EER) and detecting accuracy (Acc). | en_US |