Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework

dc.contributor.advisorSuleman, Husseinen_ZA
dc.contributor.authorMunyaradzi, Ngonien_ZA
dc.date.accessioned2014-08-20T19:30:39Z
dc.date.available2014-08-20T19:30:39Z
dc.date.issued2013en_ZA
dc.description.abstractThe digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of the manuscripts the recognition accuracy was low. In this research, a crowdsouring based method is proposed to transcribe the historical handwritten manuscripts, where volunteers transcribe the notebooks online. An online crowdsourcing transcription tool was developed and deployed. Experiments were conducted to determine the quality of transcriptions and accuracy of the volunteers compared with a gold standard. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for ǀXam text and 95% for English text. When the ǀXam text transcriptions produced by the volunteers are compared with the gold standard, the volunteers achieve an average accuracy of 69.69%. Findings show that there exists a positive linear correlation between the inter-transcriber agreement and the accuracy of transcriptions. The user survey revealed that volunteers found the transcription process enjoyable, though it was difficult. Results indicate that volunteer thinking can be used to crowdsource intellectually-intensive tasks in digital libraries like transcription of handwritten manuscripts. Volunteer thinking outperforms machine learning techniques at the task of transcribing notebooks from the Bleek and Lloyd Collection.en_ZA
dc.identifier.apacitationMunyaradzi, N. (2013). <i>Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/6640en_ZA
dc.identifier.chicagocitationMunyaradzi, Ngoni. <i>"Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2013. http://hdl.handle.net/11427/6640en_ZA
dc.identifier.citationMunyaradzi, N. 2013. Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework. University of Cape Town.en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Munyaradzi, Ngoni AB - The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of the manuscripts the recognition accuracy was low. In this research, a crowdsouring based method is proposed to transcribe the historical handwritten manuscripts, where volunteers transcribe the notebooks online. An online crowdsourcing transcription tool was developed and deployed. Experiments were conducted to determine the quality of transcriptions and accuracy of the volunteers compared with a gold standard. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for ǀXam text and 95% for English text. When the ǀXam text transcriptions produced by the volunteers are compared with the gold standard, the volunteers achieve an average accuracy of 69.69%. Findings show that there exists a positive linear correlation between the inter-transcriber agreement and the accuracy of transcriptions. The user survey revealed that volunteers found the transcription process enjoyable, though it was difficult. Results indicate that volunteer thinking can be used to crowdsource intellectually-intensive tasks in digital libraries like transcription of handwritten manuscripts. Volunteer thinking outperforms machine learning techniques at the task of transcribing notebooks from the Bleek and Lloyd Collection. DA - 2013 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2013 T1 - Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework TI - Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework UR - http://hdl.handle.net/11427/6640 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/6640
dc.identifier.vancouvercitationMunyaradzi N. Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2013 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/6640en_ZA
dc.language.isoengen_ZA
dc.publisher.departmentDepartment of Computer Scienceen_ZA
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.titleTranscription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Frameworken_ZA
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMScen_ZA
uct.type.filetypeText
uct.type.filetypeImage
uct.type.publicationResearchen_ZA
uct.type.resourceThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2013_munyaradzi_ngoni.pdf
Size:
2.47 MB
Format:
Adobe Portable Document Format
Description:
Collections