OpenUCT :: Browsing by Subject "Nguni languages"

Browsing by Subject "Nguni languages"

Now showing 1 - 2 of 2

Open Access
Strangers to brothers : interaction between south-eastern San and southern Nguni/Sotho communities
(1994) Jolly, Pieter; Parkington, John
There is presently considerable debate as to the forms of relationships established between hunter-gatherers and their non-forager neighbours and whether relationships which are documented as having been established significantly affected these hunter-gatherer societies. In southern Africa, particular attention has been paid to the effects of such contact on hunter- gatherer communities of the south-western Cape and the Kalahari. The aim of this thesis has been to assess the nature and extent of relationships established between the south-eastern San and southern Nguni and Sotho communities and to identify the extent to which the establishment of these relationships may have brought about changes in the political, social and religious systems of south- eastern hunter-gatherers. General patterns characterising interaction between a number of San and non-San hunter-gatherer societies and farming communities outside the study area are identified and are combined with archaeological and historiographical information to model relationships between the south-eastern San and southern Nguni and Sotho communities. The established and possible effects of these relationships on some south-eastern San groups are presented as well as some of the possible forms in which changes in San religious ideology and ritual practice resultant upon contact were expressed in the rock art. It is suggested that the ideologies of many south-eastern San communities, rather than being characterised by continuity throughout the contact period, were significantly influenced by the ideological systems of the southern Nguni and Sotho and that paintings at the caves of Melikane and upper Mangolong, as well as comments made upon these paintings by the 19th century San informant, Qing, should be interpreted with reference to the religious ideologies and ritual practices of the southern Nguni and Sotho as well as those of the San. Other rock paintings in areas where contact between the south-eastern San and black farming communities was prolonged and symbiotic may need to be similarly interpreted.
Open Access
Subword segmental neural language generation for Nguni languages
(2025) Meyer, Francois Rolihlahla; Buys, Jan
Deep learning models for text generation are now able to produce fluent and coherent text in many conversational settings. However, such models require large training datasets and are primarily designed for a limited number of high-resource languages. These advances are not directly applicable to low-resource languages with distinctive linguistic characteristics. In this thesis we develop text generation models for the Nguni languages of South Africa -- isiXhosa, isiZulu, isiNdebele, and Siswati. The Nguni languages are agglutinative and conjunctively written, so words are formed by stringing together morphemes. We design neural models that suit the morphological complexity of the Nguni languages by explicitly modelling the segmentation of words into subword units. We propose subword segmental modelling, a neural architecture and training algorithm that learns subword segmentation during training. The standard approach to subword modelling is to apply data-driven algorithms such as byte-pair encoding (BPE) during preprocessing. Subword segmental modelling represents a departure from this paradigm: instead of casting subword segmentation as a preprocessing step, we incorporate it into end-to-end learning to allow the model to discover the optimal subword units for a particular language and task. Explicitly modelling the complex subword structure of Nguni languages serves as an inductive bias for more efficient training on the typically limited training data. In this thesis we present subword segmental models for three natural language generation tasks. Our first model is for autoregressive language modelling. We propose the subword segmental language model (SSLM), a decoder-only model that learns subword segmentation to optimise its language modelling objective. SSLM achieves lower (better) perplexity-based intrinsic evaluation scores than tokenisation-based language models, on average across the four Nguni languages. We also evaluate SSLM as an unsupervised morphological segmenter, showing that its learned subwords are closer to morphemes than standard subword tokens. Since SSLM is our first instantiation of subword segmental modelling, we present a detailed analysis of the architectural components and hyperparameters we found to be influential during development. Our second model extends subword segmental modelling to neural machine translation (NMT). We propose subword segmental machine translation (SSMT), an encoder-decoder model that learns target language subword segmentation to optimise its sequence-to-sequence translation objective. To generate translations with SSMT, we propose dynamic decoding, a decoding algorithm for generating text with subword segmental architectures. SSMT outperforms tokenisation-based NMT on Nguni languages, achieving large gains in the extremely low-resource setting of English to Siswati translation. As for SSLM, we show that SSMT learns subword boundaries more aligned with morpheme boundaries than tokenisation-based subwords. SSMT also exhibits greater morphological compositional generalisation, the ability to generalise to novel combinations of known morphemes. We extend SSMT to multilingual translation, where it learns a single target-side subword segmentation scheme to optimise performance across multiple translation directions. We compare multilingual SSMT to multilingual tokenisation-based NMT. Multilingual SSMT does induce cross-lingual transfer, but to a lesser extent that multilingual tokenisation. In cross-lingual finetuning experiments, SSMT improves transfer between unrelated languages. Our experiments confirm that decisions around subword segmentation greatly affect cross-lingual performance. We also show that differences in orthographic word boundary alignment between languages can impede cross-lingual transfer. Our third and final model combines subword segmental modelling with a copy mechanism, for the task of data-to-text generation. We propose the subword segmental pointer generator (SSPG), which jointly learns to segment words and copy subwords to optimise data-to-text generation. We also propose unmixed decoding, a text generation algorithm for copy-equipped subword segmental models. On isiXhosa data-to-text, SSPG outperforms tokenisation-based architectures trained from scratch. Besides reference-based evaluation, we develop an extractive evaluation framework to measure how faithfully models capture the expected data content of generations. This shows that SSPG more effectively combines entity copying and morphological composition. Across all three tasks, and for all four Nguni languages, subword segmental modelling consistently equals or outperforms equivalent tokenisation-based models. Its performance gains are greatest for extremely low-resource languages and tasks. Through linguistically informed evaluations, we show that subword segmental modelling successfully acquires particular aspects of Nguni-language morphology. Its subword units resemble morphemes more closely than subword tokens and it effectively applies morphological composition. Subword segmental modelling proves effective for the Nguni languages, offering a promising new approach to text generation for low-resource, morphologically complex languages.

Browsing by Subject "Nguni languages"

Results Per Page

Sort Options