Evaluating automated and hybrid neural disambiguation for African historical named entities
| dc.contributor.advisor | Suleman, Hussein | |
| dc.contributor.author | Dunn, Jarryd | |
| dc.date.accessioned | 2023-02-15T06:44:36Z | |
| dc.date.available | 2023-02-15T06:44:36Z | |
| dc.date.issued | 2022 | |
| dc.date.updated | 2023-02-15T06:43:46Z | |
| dc.description.abstract | Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. | |
| dc.identifier.apacitation | Dunn, J. (2022). <i>Evaluating automated and hybrid neural disambiguation for African historical named entities</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/36921 | en_ZA |
| dc.identifier.chicagocitation | Dunn, Jarryd. <i>"Evaluating automated and hybrid neural disambiguation for African historical named entities."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2022. http://hdl.handle.net/11427/36921 | en_ZA |
| dc.identifier.citation | Dunn, J. 2022. Evaluating automated and hybrid neural disambiguation for African historical named entities. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/36921 | en_ZA |
| dc.identifier.ris | TY - Master Thesis AU - Dunn, Jarryd AB - Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system. DA - 2022 DB - OpenUCT DP - University of Cape Town KW - data science LK - https://open.uct.ac.za PY - 2022 T1 - Evaluating automated and hybrid neural disambiguation for African historical named entities TI - Evaluating automated and hybrid neural disambiguation for African historical named entities UR - http://hdl.handle.net/11427/36921 ER - | en_ZA |
| dc.identifier.uri | http://hdl.handle.net/11427/36921 | |
| dc.identifier.vancouvercitation | Dunn J. Evaluating automated and hybrid neural disambiguation for African historical named entities. []. ,Faculty of Science ,Department of Statistical Sciences, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/36921 | en_ZA |
| dc.language.rfc3066 | eng | |
| dc.publisher.department | Department of Statistical Sciences | |
| dc.publisher.faculty | Faculty of Science | |
| dc.subject | data science | |
| dc.title | Evaluating automated and hybrid neural disambiguation for African historical named entities | |
| dc.type | Master Thesis | |
| dc.type.qualificationlevel | Masters | |
| dc.type.qualificationlevel | MSc |