Abstract:
Speech is an easiest way to communicate with each other. Digital processing of speech
signals is very important for speedy and precise automatic speech recognition systems.
Speech recognition is the capability of an electronic device to understand spoken words,
i.e. the process of decoding an acoustic speech signal captured by a microphone or a
mobile phone to a set of words. It is a technology that can be useful in many applications
of our daily life, e.g. mobile communications, and has also become a challenge towards
human-computer interfacing (HMI) technology.
This thesis aims to develop an efficient speech recognition system for isolated
Myanmar words based on the theories of digital signal processing, speech processing, and
artificial neural network techniques. The proposed system is intended to achieve speaker
dependent recognition as well as speaker independent recognition.
A speech signal is combined with voice and unvoiced sounds. In addition, each
word in the speech is typically surrounded with silence, which may be a hindrance for
successful speech recognition. So firstly in this system, the input speeches are manually
preprocessed by using the Audacity software in order to detect the start and end points of
words and remove unwanted parts like silences in speeches. This system then extracts the
acoustically representative features like Mel-Frequency Cepstral Coefficients from the
preprocessed speech signals. Finally, those features are used to train a recognition model
of neural network with the Backpropagation algorithm for classification and recognition
of input speeches. Based on the knowledge learned during training, the recognition model
is expected to recognize the same speech by untrained new speakers (i.e. speaker
independent recognition).
The proposed system in this thesis is developed to recognize twenty isolated
Myanmar words, which are the names of the cities in Rakhine state, Shan state, and
Kachin state in Myanmar. This system consists of a database which is made up of
training and testing data sets with 2400 and 400 utterances respectively. The training
words are uttered by 10 speakers (4 males and 6 females) who are university graduate
students. As for speaker independent recognition, testing utterances are the same words
as in training but uttered by different speakers than the ones participated in training. Theiv
proposed system is implemented in MATLAB and experimental results show that it
achieved the recognition rate of about 93.5% for known speakers (i.e. speaker dependent)
and 76.5% for unknown speakers (i.e. speaker independent).