|Titel:||Automatic Speech Recognition for Amharic||Sonstige Titel:||Automatische Spracherkennung für Amharisch||Sprache:||Englisch||Autor*in:||Abate, Solomon Teferra||Schlagwörter:||Spracherkennung; Amharisch; Hidden-Markov-Modellen; Silben; Sprachkorpus; Speech Recognition; Amharic; Hidden Markov Models; Syllables; Speech corpus||GND-Schlagwörter:||Spracherkennung; Amharisch||Erscheinungsdatum:||2006||Tag der mündlichen Prüfung:||2005-12-16||Zusammenfassung:||
In this work we have explored various possibilities for developing a Large Vocabulary Speaker Independent Continuous Speech Recognition System for Amharic.
Amharic is the official language of Ethiopia. Within the Semitic
language family, it has the greatest number of speakers after Arabic. Amharic is one of the languages which have their own writing system. There is, however, no speech corpus that can be used for the development of an Automatic Speech Recognition System (ASRS) for Amharic. We, therefore, developed an Amharic speech corpus that can be used for various kinds of investigations into the nature of spoken Amharic.
Using the corpus, we have developed an ASRS for Amharic based on
Hidden Markov Models (HMM). The research was guided by the assumption that, due to their highly regular Consonant Vowel (CV) structure, Amharic syllables lend themselves to be used as a basic recognition unit. Indeed, we were able to show that syllable models can be used as a competitive alternative to the standard architecture based on triphone models.
The optimal HMM topology for Amharic CV syllables which we found in our experiments is a model with five emitting states, and twelve Gaussian mixtures without skips and jumps. Using this set of acoustic models, which requires 15MB memory, together with the language model and use of speaker adaptation, we obtained a word recognition accuracy of 90.43\% on the evaluation test set at a speed of 2.4 minutes per sentence with 5,000 words.
Among the triphone models tested, a topology with three emitting states, with skips and twelve Gaussian mixtures produced the best results. This set of acoustic models requires 38MB memory and has a word recognition accuracy of 91.31\% at a speed of 3.8 minutes per sentence.
We have analyzed the results of our experiments from the point of view of word recognition accuracy, recognition speed and memory
requirements, and concluded that for Amharic modeling CV syllables, as represented by the orthographic symbols, is a better alternative to the prevailing modeling units of elementary sounds, like phones.
Since our work is the first attempt in the area of developing ASRSs for Amharic, we would like to mention some of the areas that deserve further investigation: Speech corpora, language model, acoustic models, and the application of the recognizers. Our speech corpus is a read speech corpus and cannot be used to develop, e.g. recognizers for spontaneous speech or telephone-based applications. The acoustic models of our speech recognition system have also suffered from a shortage of training speech data. The irregular realization of the sixth order vowel and the glottal stop consonant, as well as the gemination of the other consonants are not handled by our pronunciation dictionaries. Due to the rich inflection of Amharic, we also have to deal with the relatively high perplexity of our language models. So far, we have not taken any step towards the application of the recognizers that we have developed. We recommend, therefore, that researchers and developers give attention to these areas of speech recognition for Amharic.
|URL:||https://ediss.sub.uni-hamburg.de/handle/ediss/1420||URN:||urn:nbn:de:gbv:18-29818||Dokumenttyp:||Dissertation||Betreuer*in:||Menzel, Wolfgang (Prof. Dr.)|
|Enthalten in den Sammlungen:||Elektronische Dissertationen und Habilitationen|
Diese Publikation steht in elektronischer Form im Internet bereit und kann gelesen werden. Über den freien Zugang hinaus wurden durch die Urheberin / den Urheber keine weiteren Rechte eingeräumt. Nutzungshandlungen (wie zum Beispiel der Download, das Bearbeiten, das Weiterverbreiten) sind daher nur im Rahmen der gesetzlichen Erlaubnisse des Urheberrechtsgesetzes (UrhG) erlaubt. Dies gilt für die Publikation sowie für ihre einzelnen Bestandteile, soweit nichts Anderes ausgewiesen ist.