Vision-Based automatic translation for South African Sign Language (SASL)

dc.contributor.advisorVerrinder, Robyn
dc.contributor.advisorTsoeu, Mohohlo
dc.contributor.authorSetshekgamollo, Mokgadi
dc.date.accessioned2025-04-03T12:21:55Z
dc.date.available2025-04-03T12:21:55Z
dc.date.issued2024
dc.date.updated2025-04-03T12:18:48Z
dc.description.abstractThere are more than four million South Africans who are deaf and hard of hearing (DHH). However, most people with hearing abilities neither understand sign language nor know how to sign. This creates a communication barrier between the deaf and hard of hearing and people with hearing, to the disadvantage of the DHH. In 2018, South African Sign Language (SASL) became an official subject in South African schools and in 2023 it became South Africa's 12th official language. However, these implementations do not impose it on institutions and service providers. Although some provisions are made to cater to the needs of DHH people in the form of sign language interpreters, such interpreters are not always readily available. Sign language interpreters are also costly, charging in excess of R500.00 per hour. In this research, we developed the first vision-based Neural Sign Language Translation model for SASL as a first step towards bridging the communication gap. To this end, we recorded a sizeable parallel SASL and English corpus with the help of six sign language interpreters, three of which are native signers and constitute around 90% of the dataset. The dataset comprises 5047 sentences in the domain of government and politics recorded in a studio setting with a uniform green background. At an average of 3.83 seconds per segment, this equates to around five hours of sign language data. We conducted comprehensive experiments using various visual feature extraction architectures as well as translation architectures. We found that recurrent translation models outperform transformer models. We also investigated the impact of pretraining our feature extractor on a Continuous Sign Language Recognition task and fine-tuning on the SASL dataset and found that this is effective for improving feature extraction. Our best models achieved a BLEU-4 score of 1.35 on the SASL test set, a comparable performance to the How2Sign dataset with a best BLEU-4 score of 1.73 but much lower than on the RWTH-PHOENIX-Weather 2014T dataset which produced a BLEU-4 score of 13.23 without gloss supervision. Our experiments also showed that annotating fingerspelled words as individual letters improves the performance of the model. Our model might benefit from the collection of more data and the addition of gloss annotation. Our results on the SASL dataset are very poor and still very far from practical, indicating that more resources and experiments are required before we can remove the language barrier between the hearing and the Deaf. This would be most effectively achieved by working in collaboration with the Deaf Community to produce high quality datasets, annotations, and models.
dc.identifier.apacitationSetshekgamollo, M. (2024). <i>Vision-Based automatic translation for South African Sign Language (SASL)</i>. (). University of Cape Town ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering. Retrieved from http://hdl.handle.net/11427/41343en_ZA
dc.identifier.chicagocitationSetshekgamollo, Mokgadi. <i>"Vision-Based automatic translation for South African Sign Language (SASL)."</i> ., University of Cape Town ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering, 2024. http://hdl.handle.net/11427/41343en_ZA
dc.identifier.citationSetshekgamollo, M. 2024. Vision-Based automatic translation for South African Sign Language (SASL). . University of Cape Town ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering. http://hdl.handle.net/11427/41343en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Setshekgamollo, Mokgadi AB - There are more than four million South Africans who are deaf and hard of hearing (DHH). However, most people with hearing abilities neither understand sign language nor know how to sign. This creates a communication barrier between the deaf and hard of hearing and people with hearing, to the disadvantage of the DHH. In 2018, South African Sign Language (SASL) became an official subject in South African schools and in 2023 it became South Africa's 12th official language. However, these implementations do not impose it on institutions and service providers. Although some provisions are made to cater to the needs of DHH people in the form of sign language interpreters, such interpreters are not always readily available. Sign language interpreters are also costly, charging in excess of R500.00 per hour. In this research, we developed the first vision-based Neural Sign Language Translation model for SASL as a first step towards bridging the communication gap. To this end, we recorded a sizeable parallel SASL and English corpus with the help of six sign language interpreters, three of which are native signers and constitute around 90% of the dataset. The dataset comprises 5047 sentences in the domain of government and politics recorded in a studio setting with a uniform green background. At an average of 3.83 seconds per segment, this equates to around five hours of sign language data. We conducted comprehensive experiments using various visual feature extraction architectures as well as translation architectures. We found that recurrent translation models outperform transformer models. We also investigated the impact of pretraining our feature extractor on a Continuous Sign Language Recognition task and fine-tuning on the SASL dataset and found that this is effective for improving feature extraction. Our best models achieved a BLEU-4 score of 1.35 on the SASL test set, a comparable performance to the How2Sign dataset with a best BLEU-4 score of 1.73 but much lower than on the RWTH-PHOENIX-Weather 2014T dataset which produced a BLEU-4 score of 13.23 without gloss supervision. Our experiments also showed that annotating fingerspelled words as individual letters improves the performance of the model. Our model might benefit from the collection of more data and the addition of gloss annotation. Our results on the SASL dataset are very poor and still very far from practical, indicating that more resources and experiments are required before we can remove the language barrier between the hearing and the Deaf. This would be most effectively achieved by working in collaboration with the Deaf Community to produce high quality datasets, annotations, and models. DA - 2024 DB - OpenUCT DP - University of Cape Town KW - Engineering LK - https://open.uct.ac.za PB - University of Cape Town PY - 2024 T1 - Vision-Based automatic translation for South African Sign Language (SASL) TI - Vision-Based automatic translation for South African Sign Language (SASL) UR - http://hdl.handle.net/11427/41343 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/41343
dc.identifier.vancouvercitationSetshekgamollo M. Vision-Based automatic translation for South African Sign Language (SASL). []. University of Cape Town ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering, 2024 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41343en_ZA
dc.language.isoen
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Electrical Engineering
dc.publisher.facultyFaculty of Engineering and the Built Environment
dc.publisher.institutionUniversity of Cape Town
dc.subjectEngineering
dc.titleVision-Based automatic translation for South African Sign Language (SASL)
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_ebe_2024_setshekgamollo mokgadi.pdf
Size:
7.55 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections