Using question-specific vocabularies to support speech data collection with SALAAM

Chibuye, Kayokwa Nick

Using question-specific vocabularies to support speech data collection with SALAAM

Master Thesis

2019

Abstract

There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping.

Keywords

computer science

Reference:

Collections

Masters

Full item page