Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu

Doctoral Thesis

2022

Permanent link to this Item
Authors
Supervisors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers, have not received a lot of attention in the field despite the potential impact of NLG systems for their speakers. The existing NLG systems created for these languages rely on ad hoc methods for surface realisation. Surface realisation is the process of generating text from a system's abstract representations of sentences. The aforementioned methods combine templates and grammar rules since the languages are low-resourced and grammatically rich. However, do not use their scant linguistic resources efficiently, they do not rely on a template specification that supports interoperability, and do not use an architecture that yields easy-to-maintain software since none exists. The objectives of this thesis are to create the foundations for easy to maintain and reusable surface realisation tools for isiXhosa and isiZulu by establishing a principled way to pair templates and grammar rules, organise surface realisation modules such that the components are modular, analysable, and reusable, and create template specifications that are interoperable. In addition, it is to demonstrate that aforementioned objectives can be achieved while generating good quality isiXhosa and isiZulu text in the data-to-text and knowledge-to-text areas. We achieve these objectives by developing a model-based approach of pairing templates and Computational Grammar Rules (CGRs) to obtain linguistically wellfounded templates that are suitable for low-resourced and grammatically rich languages. To obtain interoperable template specifications, we created a task ontology using a bottom-up approach and evaluated it via the standard practice of using Competency Questions (CQs) and removing inconsistencies via an automated reasoner. We also created an architecture that satisfies the most maintainability features from the BS ISO/IEC 25010:2011 standard. In addition, we created proof-of-concept text generation tools that use the proposed approaches and artifacts to generate isiZulu and isiXhosa text and surveyed speakers of the two languages to establish the quality of the text. We have found that most (57%) of the generated isiXhosa texts are judged positively and there is no consensus on the remaining texts, possibly due to differences in dialect. In addition, most (83%) of the generated isiZulu texts are also judged positively as they have at most one participant who considers them to be ungrammatical and unacceptable.
Description

Reference:

Collections