Grammars for generating isiXhosa and isiZulu weather bulletin verbs

 

Show simple item record

dc.contributor.advisor Keet, C Maria en_ZA
dc.contributor.author Mahlaza, Zola en_ZA
dc.date.accessioned 2018-05-07T14:23:55Z
dc.date.available 2018-05-07T14:23:55Z
dc.date.issued 2018 en_ZA
dc.identifier.citation Mahlaza, Z. 2018. Grammars for generating isiXhosa and isiZulu weather bulletin verbs. University of Cape Town. en_ZA
dc.identifier.uri http://hdl.handle.net/11427/27997
dc.description.abstract The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects. en_ZA
dc.language.iso eng en_ZA
dc.subject.other Natural Language Generation en_ZA
dc.subject.other Computational Linguistics en_ZA
dc.title Grammars for generating isiXhosa and isiZulu weather bulletin verbs en_ZA
dc.type Master Thesis
uct.type.publication Research en_ZA
uct.type.resource Thesis en_ZA
dc.publisher.institution University of Cape Town
dc.publisher.faculty Faculty of Science en_ZA
dc.publisher.department Department of Computer Science en_ZA
dc.type.qualificationlevel Masters
dc.type.qualificationname MSc en_ZA
uct.type.filetype Text
uct.type.filetype Image
dc.identifier.apacitation Mahlaza, Z. (2018). <i>Grammars for generating isiXhosa and isiZulu weather bulletin verbs</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/27997 en_ZA
dc.identifier.chicagocitation Mahlaza, Zola. <i>"Grammars for generating isiXhosa and isiZulu weather bulletin verbs."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018. http://hdl.handle.net/11427/27997 en_ZA
dc.identifier.vancouvercitation Mahlaza Z. Grammars for generating isiXhosa and isiZulu weather bulletin verbs. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/27997 en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Mahlaza, Zola AB - The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects. DA - 2018 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2018 T1 - Grammars for generating isiXhosa and isiZulu weather bulletin verbs TI - Grammars for generating isiXhosa and isiZulu weather bulletin verbs UR - http://hdl.handle.net/11427/27997 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record