Evaluating automated and hybrid neural disambiguation for African historical named entities

dc.contributor.advisorSuleman, Hussein
dc.contributor.authorDunn, Jarryd
dc.date.accessioned2023-02-15T06:44:36Z
dc.date.available2023-02-15T06:44:36Z
dc.date.issued2022
dc.date.updated2023-02-15T06:43:46Z
dc.description.abstractDocuments detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system.
dc.identifier.apacitationDunn, J. (2022). <i>Evaluating automated and hybrid neural disambiguation for African historical named entities</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/36921en_ZA
dc.identifier.chicagocitationDunn, Jarryd. <i>"Evaluating automated and hybrid neural disambiguation for African historical named entities."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2022. http://hdl.handle.net/11427/36921en_ZA
dc.identifier.citationDunn, J. 2022. Evaluating automated and hybrid neural disambiguation for African historical named entities. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/36921en_ZA
dc.identifier.ris TY - Master Thesis AU - Dunn, Jarryd AB - Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. DA - 2022 DB - OpenUCT DP - University of Cape Town KW - data science LK - https://open.uct.ac.za PY - 2022 T1 - Evaluating automated and hybrid neural disambiguation for African historical named entities TI - Evaluating automated and hybrid neural disambiguation for African historical named entities UR - http://hdl.handle.net/11427/36921 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/36921
dc.identifier.vancouvercitationDunn J. Evaluating automated and hybrid neural disambiguation for African historical named entities. []. ,Faculty of Science ,Department of Statistical Sciences, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/36921en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Statistical Sciences
dc.publisher.facultyFaculty of Science
dc.subjectdata science
dc.titleEvaluating automated and hybrid neural disambiguation for African historical named entities
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2022_dunn jarryd.pdf
Size:
2.95 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections