Eingang zum Volltext in OPUS
Hinweis zum Urheberrecht
Dissertation zugänglich unter
Automatic Speech Recognition for Amharic
Automatische Spracherkennung für Amharisch
Abate, Solomon Teferra
(2006) An Amharic Speech Corpus for Large Vocabulary Continuous Speech Recognition. Eurospeech Interspeech, 9th European Conference on Speech Communication and Technology. Lisbon, September 4-9, 2005.
Dokument 1.pdf (1.230 KB)
Spracherkennung , Amharisch
Freie Schlagwörter (Deutsch):
Spracherkennung , Amharisch , Hidden-Markov-Modellen ,Silben , Sprachkorpus
Freie Schlagwörter (Englisch):
Speech Recognition, Amharic , Hidden Markov Models , Syllables ,Speech corpus
54.75 , 18.73
Menzel, Wolfgang (Prof. Dr.)
Tag der mündlichen Prüfung:
Kurzfassung auf Englisch:
In this work we have explored various possibilities for developing a Large Vocabulary Speaker Independent Continuous Speech Recognition System for Amharic.
Amharic is the official language of Ethiopia. Within the Semitic
language family, it has the greatest number of speakers after Arabic. Amharic is one of the languages which have their own writing system. There is, however, no speech corpus that can be used for the development of an Automatic Speech Recognition System (ASRS) for Amharic. We, therefore, developed an Amharic speech corpus that can be used for various kinds of investigations into the nature of spoken Amharic.
Using the corpus, we have developed an ASRS for Amharic based on
Hidden Markov Models (HMM). The research was guided by the assumption that, due to their highly regular Consonant Vowel (CV) structure, Amharic syllables lend themselves to be used as a basic recognition unit. Indeed, we were able to show that syllable models can be used as a competitive alternative to the standard architecture based on triphone models.
The optimal HMM topology for Amharic CV syllables which we found in our experiments is a model with five emitting states, and twelve Gaussian mixtures without skips and jumps. Using this set of acoustic models, which requires 15MB memory, together with the language model and use of speaker adaptation, we obtained a word recognition accuracy of 90.43\% on the evaluation test set at a speed of 2.4 minutes per sentence with 5,000 words.
Among the triphone models tested, a topology with three emitting states, with skips and twelve Gaussian mixtures produced the best results. This set of acoustic models requires 38MB memory and has a word recognition accuracy of 91.31\% at a speed of 3.8 minutes per sentence.
We have analyzed the results of our experiments from the point of view of word recognition accuracy, recognition speed and memory
requirements, and concluded that for Amharic modeling CV syllables, as represented by the orthographic symbols, is a better alternative to the prevailing modeling units of elementary sounds, like phones.
Since our work is the first attempt in the area of developing ASRSs for Amharic, we would like to mention some of the areas that deserve further investigation: Speech corpora, language model, acoustic models, and the application of the recognizers. Our speech corpus is a read speech corpus and cannot be used to develop, e.g. recognizers for spontaneous speech or telephone-based applications. The acoustic models of our speech recognition system have also suffered from a shortage of training speech data. The irregular realization of the sixth order vowel and the glottal stop consonant, as well as the gemination of the other consonants are not handled by our pronunciation dictionaries. Due to the rich inflection of Amharic, we also have to deal with the relatively high perplexity of our language models. So far, we have not taken any step towards the application of the recognizers that we have developed. We recommend, therefore, that researchers and developers give attention to these areas of speech recognition for Amharic.