Vision-Based automatic translation for South African Sign Language (SASL)

Thesis / Dissertation

2024

Permanent link to this Item
Authors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
There are more than four million South Africans who are deaf and hard of hearing (DHH). However, most people with hearing abilities neither understand sign language nor know how to sign. This creates a communication barrier between the deaf and hard of hearing and people with hearing, to the disadvantage of the DHH. In 2018, South African Sign Language (SASL) became an official subject in South African schools and in 2023 it became South Africa's 12th official language. However, these implementations do not impose it on institutions and service providers. Although some provisions are made to cater to the needs of DHH people in the form of sign language interpreters, such interpreters are not always readily available. Sign language interpreters are also costly, charging in excess of R500.00 per hour. In this research, we developed the first vision-based Neural Sign Language Translation model for SASL as a first step towards bridging the communication gap. To this end, we recorded a sizeable parallel SASL and English corpus with the help of six sign language interpreters, three of which are native signers and constitute around 90% of the dataset. The dataset comprises 5047 sentences in the domain of government and politics recorded in a studio setting with a uniform green background. At an average of 3.83 seconds per segment, this equates to around five hours of sign language data. We conducted comprehensive experiments using various visual feature extraction architectures as well as translation architectures. We found that recurrent translation models outperform transformer models. We also investigated the impact of pretraining our feature extractor on a Continuous Sign Language Recognition task and fine-tuning on the SASL dataset and found that this is effective for improving feature extraction. Our best models achieved a BLEU-4 score of 1.35 on the SASL test set, a comparable performance to the How2Sign dataset with a best BLEU-4 score of 1.73 but much lower than on the RWTH-PHOENIX-Weather 2014T dataset which produced a BLEU-4 score of 13.23 without gloss supervision. Our experiments also showed that annotating fingerspelled words as individual letters improves the performance of the model. Our model might benefit from the collection of more data and the addition of gloss annotation. Our results on the SASL dataset are very poor and still very far from practical, indicating that more resources and experiments are required before we can remove the language barrier between the hearing and the Deaf. This would be most effectively achieved by working in collaboration with the Deaf Community to produce high quality datasets, annotations, and models.
Description
Keywords

Reference:

Collections