Investigating language preferences in improving multilingual Swahili information retrieval

dc.contributor.advisorSuleman, Hussein
dc.contributor.authorTelemala, Joseph Philipo
dc.date.accessioned2022-06-29T10:45:14Z
dc.date.available2022-06-29T10:45:14Z
dc.date.issued2022
dc.date.updated2022-06-29T10:44:53Z
dc.description.abstractMultilingual Information Retrieval (MLIR) systems are designed to retrieve information from multiple languages in response to a query posed in another language or in one of the languages in which a user is looking for information. Researchers have proposed several approaches for combining the results from individual result lists to produce a single result list. Some are heuristics, such as round-robin, in which a result is drawn from each result list one at a time until all lists are exhausted, while others are Machine Learning (ML)-based, in which a model is trained using a variety of features from the query and the required documents. These approaches strive for topical relevance, which is the most important goal in satisfying users' information needs. However, multilingual speakers exhibit a variety of behaviours, some of which are unique to certain individuals based on their historical, cultural, and linguistic backgrounds. Unfortunately, these behaviours are ignored in the current MLIR system design and implementation. Current MLIR systems present results that do not take people's language preferences into account when ranking results. Studies have shown that users have different language preferences based on their search topics – Topic-Language (T-L) preferences. This study proposes using T-L preferences to improve the relevance of the ranked MLIR results. To achieve this aim, we used a survey-based study to try to understand the information needs and Web search behaviour of Swahili-speaking Web users in Tanzania. One bold behaviour of such multilingual Web users that emerged is code-switching. Several factors, such as information context and search topic, were identified as reasons for such frequent language switching. We then created a prototype multilingual search engine with which users interacted in order to quantify how much the language of the query or the selected results is influenced by the search topic. We estimated the relationship between the topic of search and the language of the query and clicked results using the resulting query and click-through logs. The findings revealed that Swahili-speaking Web users have language preferences for certain topics. For example, Kiswahili was significantly preferred as a results language in only 9% of the examined topics, English was preferred in 26% of the topics, and there was no preference for language of results in the remaining 65% of the topics. Based on these findings, we created the T-L-based algorithm, which re-ranks the results based on T-L associations/preferences. We evaluated our proposed T-L-based algorithm using clickthrough logs from our prototype guided multilingual search engine. The results show that incorporating language preferences into the ranking model significantly improves the relevance MLIR results in some specific cases. The strength of the T-L association and the number of relevant results in the preferred language's list were discovered to be driving factors in the performance improvement of the T-L-based algorithm. This thesis provides evidence that using language preferences can potentially improve the relevance of MLIR results for some topics that are preferentially expressed in specific languages. This is important in communities where information search and access are hampered by a variety of factors and there is a clear lineage in language use, where MLIR's topical relevance alone may not be sufficient.
dc.identifier.apacitationTelemala, J. P. (2022). <i>Investigating language preferences in improving multilingual Swahili information retrieval</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/36568en_ZA
dc.identifier.chicagocitationTelemala, Joseph Philipo. <i>"Investigating language preferences in improving multilingual Swahili information retrieval."</i> ., ,Faculty of Science ,Department of Computer Science, 2022. http://hdl.handle.net/11427/36568en_ZA
dc.identifier.citationTelemala, J.P. 2022. Investigating language preferences in improving multilingual Swahili information retrieval. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/36568en_ZA
dc.identifier.ris TY - Doctoral Thesis AU - Telemala, Joseph Philipo AB - Multilingual Information Retrieval (MLIR) systems are designed to retrieve information from multiple languages in response to a query posed in another language or in one of the languages in which a user is looking for information. Researchers have proposed several approaches for combining the results from individual result lists to produce a single result list. Some are heuristics, such as round-robin, in which a result is drawn from each result list one at a time until all lists are exhausted, while others are Machine Learning (ML)-based, in which a model is trained using a variety of features from the query and the required documents. These approaches strive for topical relevance, which is the most important goal in satisfying users' information needs. However, multilingual speakers exhibit a variety of behaviours, some of which are unique to certain individuals based on their historical, cultural, and linguistic backgrounds. Unfortunately, these behaviours are ignored in the current MLIR system design and implementation. Current MLIR systems present results that do not take people's language preferences into account when ranking results. Studies have shown that users have different language preferences based on their search topics – Topic-Language (T-L) preferences. This study proposes using T-L preferences to improve the relevance of the ranked MLIR results. To achieve this aim, we used a survey-based study to try to understand the information needs and Web search behaviour of Swahili-speaking Web users in Tanzania. One bold behaviour of such multilingual Web users that emerged is code-switching. Several factors, such as information context and search topic, were identified as reasons for such frequent language switching. We then created a prototype multilingual search engine with which users interacted in order to quantify how much the language of the query or the selected results is influenced by the search topic. We estimated the relationship between the topic of search and the language of the query and clicked results using the resulting query and click-through logs. The findings revealed that Swahili-speaking Web users have language preferences for certain topics. For example, Kiswahili was significantly preferred as a results language in only 9% of the examined topics, English was preferred in 26% of the topics, and there was no preference for language of results in the remaining 65% of the topics. Based on these findings, we created the T-L-based algorithm, which re-ranks the results based on T-L associations/preferences. We evaluated our proposed T-L-based algorithm using clickthrough logs from our prototype guided multilingual search engine. The results show that incorporating language preferences into the ranking model significantly improves the relevance MLIR results in some specific cases. The strength of the T-L association and the number of relevant results in the preferred language's list were discovered to be driving factors in the performance improvement of the T-L-based algorithm. This thesis provides evidence that using language preferences can potentially improve the relevance of MLIR results for some topics that are preferentially expressed in specific languages. This is important in communities where information search and access are hampered by a variety of factors and there is a clear lineage in language use, where MLIR's topical relevance alone may not be sufficient. DA - 2022 DB - OpenUCT DP - University of Cape Town KW - computer science LK - https://open.uct.ac.za PY - 2022 T1 - Investigating language preferences in improving multilingual Swahili information retrieval TI - Investigating language preferences in improving multilingual Swahili information retrieval UR - http://hdl.handle.net/11427/36568 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/36568
dc.identifier.vancouvercitationTelemala JP. Investigating language preferences in improving multilingual Swahili information retrieval. []. ,Faculty of Science ,Department of Computer Science, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/36568en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Computer Science
dc.publisher.facultyFaculty of Science
dc.subjectcomputer science
dc.titleInvestigating language preferences in improving multilingual Swahili information retrieval
dc.typeDoctoral Thesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationlevelPhD
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2022_telemala joseph philipo.pdf
Size:
3.38 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections