Using question-specific vocabularies to support speech data collection with SALAAM

dc.contributor.advisorDe Renzi, Brian
dc.contributor.authorChibuye, Kayokwa Nick
dc.date.accessioned2020-03-10T13:58:34Z
dc.date.available2020-03-10T13:58:34Z
dc.date.issued2019
dc.date.updated2020-03-10T13:45:53Z
dc.description.abstractThere has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping.
dc.identifier.apacitationChibuye, K. N. (2019). <i>Using question-specific vocabularies to support speech data collection with SALAAM</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/31535en_ZA
dc.identifier.chicagocitationChibuye, Kayokwa Nick. <i>"Using question-specific vocabularies to support speech data collection with SALAAM."</i> ., ,Faculty of Science ,Department of Computer Science, 2019. http://hdl.handle.net/11427/31535en_ZA
dc.identifier.citationChibuye, K.N. 2019. Using question-specific vocabularies to support speech data collection with SALAAM. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/31535en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Chibuye, Kayokwa Nick AB - There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping. DA - 2019 DB - OpenUCT DP - University of Cape Town KW - computer science LK - https://open.uct.ac.za PY - 2019 T1 - Using question-specific vocabularies to support speech data collection with SALAAM TI - Using question-specific vocabularies to support speech data collection with SALAAM UR - http://hdl.handle.net/11427/31535 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/31535
dc.identifier.vancouvercitationChibuye KN. Using question-specific vocabularies to support speech data collection with SALAAM. []. ,Faculty of Science ,Department of Computer Science, 2019 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/31535en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Computer Science
dc.publisher.facultyFaculty of Science
dc.subjectcomputer science
dc.titleUsing question-specific vocabularies to support speech data collection with SALAAM
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2019_chibuye_kayokwa_nick.pdf
Size:
1.79 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections