Mixed-Language Arabic- English Information Retrieval

Mustafa, Ali Mohammed

Mixed-Language Arabic- English Information Retrieval

dc.contributor.advisor	Suleman, Hussein	en_ZA
dc.contributor.author	Mustafa, Ali Mohammed	en_ZA
dc.date.accessioned	2014-08-13T19:31:35Z
dc.date.available	2014-08-13T19:31:35Z
dc.date.issued	2013	en_ZA
dc.description	Includes abstract.	en_ZA
dc.description	Includes bibliographical references.	en_ZA
dc.description.abstract	This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries.	en_ZA
dc.identifier.apacitation	Mustafa, A. M. (2013). <i>Mixed-Language Arabic- English Information Retrieval</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/6421	en_ZA
dc.identifier.chicagocitation	Mustafa, Ali Mohammed. <i>"Mixed-Language Arabic- English Information Retrieval."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2013. http://hdl.handle.net/11427/6421	en_ZA
dc.identifier.citation	Mustafa, A. 2013. Mixed-Language Arabic- English Information Retrieval. University of Cape Town.	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Mustafa, Ali Mohammed AB - This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries. DA - 2013 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2013 T1 - Mixed-Language Arabic- English Information Retrieval TI - Mixed-Language Arabic- English Information Retrieval UR - http://hdl.handle.net/11427/6421 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/6421
dc.identifier.vancouvercitation	Mustafa AM. Mixed-Language Arabic- English Information Retrieval. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2013 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/6421	en_ZA
dc.language.iso	eng	en_ZA
dc.publisher.department	Department of Computer Science	en_ZA
dc.publisher.faculty	Faculty of Science	en_ZA
dc.publisher.institution	University of Cape Town
dc.subject.other	Computer Science	en_ZA
dc.title	Mixed-Language Arabic- English Information Retrieval	en_ZA
dc.type	Doctoral Thesis
dc.type.qualificationlevel	Doctoral
dc.type.qualificationname	PhD	en_ZA
uct.type.filetype	Text
uct.type.filetype	Image
uct.type.publication	Research	en_ZA
uct.type.resource	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2013_ali_mohammed_mustafa.pdf
Size:: 3.07 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

PhD / Doctoral