Crowdsourcing a text corpus for a low resource language

dc.contributor.advisorSuleman, Husseinen_ZA
dc.contributor.authorPackham, Seanen_ZA
dc.date.accessioned2016-07-18T12:55:04Z
dc.date.available2016-07-18T12:55:04Z
dc.date.issued2016en_ZA
dc.description.abstractLow resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of sufficient size and quality to produce working machine translation systems and it has been acknowledged that there is little to know training data and sourcing translations from professionals can be a costly process. A crowdsourcing translation game which paid participants for their contributions was proposed as a solution to source original and relevant parallel corpora for low resource languages such as isiXhosa. The objectives of this dissertation is to report on the four experiments that were conducted to assess user motivation and contribution quantity under various scenarios using the developed crowdsourcing translation game. The first experiment was a pilot study to test a custom built system and to find out if social network users would volunteer to participate in a translation game for free. The second experiment tested multiple payment schemes with users from the University of Cape Town. The schemes rewarded users with consistent, increasing or decreasing amounts for subsequent contributions. Experiment 3 tested whether the same users from Experiment 2 would continue contributing if payments were taken away. The last experiment tested a payment scheme that did not offer a direct and guaranteed reward. Users were paid based on their leaderboard placement and only a limited number of the top leaderboard spots were allocated rewards. From experiment 1 and 3 we found that people do not volunteer without financial incentives, experiment 2 and 4 showed that people want increased rewards when putting in increased effort , experiment 3 also showed that people will not continue contributing if the financial incentives are taken away and experiment 4 also showed that the possibility of incentives is as attractive as offering guaranteed incentives .en_ZA
dc.identifier.apacitationPackham, S. (2016). <i>Crowdsourcing a text corpus for a low resource language</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/20436en_ZA
dc.identifier.chicagocitationPackham, Sean. <i>"Crowdsourcing a text corpus for a low resource language."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2016. http://hdl.handle.net/11427/20436en_ZA
dc.identifier.citationPackham, S. 2016. Crowdsourcing a text corpus for a low resource language. University of Cape Town.en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Packham, Sean AB - Low resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of sufficient size and quality to produce working machine translation systems and it has been acknowledged that there is little to know training data and sourcing translations from professionals can be a costly process. A crowdsourcing translation game which paid participants for their contributions was proposed as a solution to source original and relevant parallel corpora for low resource languages such as isiXhosa. The objectives of this dissertation is to report on the four experiments that were conducted to assess user motivation and contribution quantity under various scenarios using the developed crowdsourcing translation game. The first experiment was a pilot study to test a custom built system and to find out if social network users would volunteer to participate in a translation game for free. The second experiment tested multiple payment schemes with users from the University of Cape Town. The schemes rewarded users with consistent, increasing or decreasing amounts for subsequent contributions. Experiment 3 tested whether the same users from Experiment 2 would continue contributing if payments were taken away. The last experiment tested a payment scheme that did not offer a direct and guaranteed reward. Users were paid based on their leaderboard placement and only a limited number of the top leaderboard spots were allocated rewards. From experiment 1 and 3 we found that people do not volunteer without financial incentives, experiment 2 and 4 showed that people want increased rewards when putting in increased effort , experiment 3 also showed that people will not continue contributing if the financial incentives are taken away and experiment 4 also showed that the possibility of incentives is as attractive as offering guaranteed incentives . DA - 2016 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2016 T1 - Crowdsourcing a text corpus for a low resource language TI - Crowdsourcing a text corpus for a low resource language UR - http://hdl.handle.net/11427/20436 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/20436
dc.identifier.vancouvercitationPackham S. Crowdsourcing a text corpus for a low resource language. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2016 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/20436en_ZA
dc.language.isoengen_ZA
dc.publisher.departmentDepartment of Computer Scienceen_ZA
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.subject.otherComputer Scienceen_ZA
dc.titleCrowdsourcing a text corpus for a low resource languageen_ZA
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMScen_ZA
uct.type.filetypeText
uct.type.filetypeImage
uct.type.publicationResearchen_ZA
uct.type.resourceThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2016_packham_sean__.pdf
Size:
2.13 MB
Format:
Adobe Portable Document Format
Description:
Collections