Natural language interface to relational database: a simplified customization approach

Mvumbi, Tresor

Natural language interface to relational database: a simplified customization approach

dc.contributor.advisor	Keet, Maria	en_ZA
dc.contributor.advisor	Bagula, Antoine	en_ZA
dc.contributor.author	Mvumbi, Tresor	en_ZA
dc.date.accessioned	2017-01-25T14:11:07Z
dc.date.available	2017-01-25T14:11:07Z
dc.date.issued	2016	en_ZA
dc.description.abstract	Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on the development and evaluation of methods allowing to simplify customization of NLIDB targeting relational databases without sacrificing coverage and accuracy. This goal is approached by the introduction of two authoring frameworks that aim to reduce the workload required to port a NLIDB to a new domain. The first authoring approach is called top-down; it assumes the existence of a corpus of unannotated natural language sample questions used to pre-harvest key lexical terms to simplify customization. The top-down approach further reduces the configuration workload by autoincluding the semantics for negative form of verbs, comparative and superlative forms of adjectives in the configuration model. The second authoring approach introduced is bottom-up; it explores the possibility of building a configuration model with no manual customization using the information from the database schema and an off-the-shelf dictionary. The evaluation of the prototype system with geo-query, a benchmark query corpus, has shown that the top-down approach significantly reduces the customization workload: 93% of the entries defining the meaning of verbs and adjectives which represents the hard work has been automatically generated by the system; only 26 straightforward mappings and 3 manual definitions of meaning were required for customization. The top-down approach answered correctly 74.5 % of the questions. The bottom-up approach, however, has correctly answered only 1/3 of the questions due to insufficient lexicon and missing semantics. The use of an external lexicon did not improve the system's accuracy. The bottom-up model has nevertheless correctly answered 3/4 of the 105 simple retrieval questions in the query corpus not requiring nesting. Therefore, the bottom-up approach can be useful to build an initial lightweight configuration model that can be incrementally refined by using the failed queries to train a topdown model for example. The experimental results for top-down suggest that it is indeed possible to construct a portable NLIDB that reduces the configuration effort while maintaining a decent coverage and accuracy.	en_ZA
dc.identifier.apacitation	Mvumbi, T. (2016). <i>Natural language interface to relational database: a simplified customization approach</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/23058	en_ZA
dc.identifier.chicagocitation	Mvumbi, Tresor. <i>"Natural language interface to relational database: a simplified customization approach."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2016. http://hdl.handle.net/11427/23058	en_ZA
dc.identifier.citation	Mvumbi, T. 2016. Natural language interface to relational database: a simplified customization approach. University of Cape Town.	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Mvumbi, Tresor AB - Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on the development and evaluation of methods allowing to simplify customization of NLIDB targeting relational databases without sacrificing coverage and accuracy. This goal is approached by the introduction of two authoring frameworks that aim to reduce the workload required to port a NLIDB to a new domain. The first authoring approach is called top-down; it assumes the existence of a corpus of unannotated natural language sample questions used to pre-harvest key lexical terms to simplify customization. The top-down approach further reduces the configuration workload by autoincluding the semantics for negative form of verbs, comparative and superlative forms of adjectives in the configuration model. The second authoring approach introduced is bottom-up; it explores the possibility of building a configuration model with no manual customization using the information from the database schema and an off-the-shelf dictionary. The evaluation of the prototype system with geo-query, a benchmark query corpus, has shown that the top-down approach significantly reduces the customization workload: 93% of the entries defining the meaning of verbs and adjectives which represents the hard work has been automatically generated by the system; only 26 straightforward mappings and 3 manual definitions of meaning were required for customization. The top-down approach answered correctly 74.5 % of the questions. The bottom-up approach, however, has correctly answered only 1/3 of the questions due to insufficient lexicon and missing semantics. The use of an external lexicon did not improve the system's accuracy. The bottom-up model has nevertheless correctly answered 3/4 of the 105 simple retrieval questions in the query corpus not requiring nesting. Therefore, the bottom-up approach can be useful to build an initial lightweight configuration model that can be incrementally refined by using the failed queries to train a topdown model for example. The experimental results for top-down suggest that it is indeed possible to construct a portable NLIDB that reduces the configuration effort while maintaining a decent coverage and accuracy. DA - 2016 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2016 T1 - Natural language interface to relational database: a simplified customization approach TI - Natural language interface to relational database: a simplified customization approach UR - http://hdl.handle.net/11427/23058 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/23058
dc.identifier.vancouvercitation	Mvumbi T. Natural language interface to relational database: a simplified customization approach. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2016 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/23058	en_ZA
dc.language.iso	eng	en_ZA
dc.publisher.department	Department of Computer Science	en_ZA
dc.publisher.faculty	Faculty of Science	en_ZA
dc.publisher.institution	University of Cape Town
dc.subject.other	Computer Science	en_ZA
dc.title	Natural language interface to relational database: a simplified customization approach	en_ZA
dc.type	Master Thesis
dc.type.qualificationlevel	Masters
dc.type.qualificationname	MSc	en_ZA
uct.type.filetype	Text
uct.type.filetype	Image
uct.type.publication	Research	en_ZA
uct.type.resource	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2016_mvumbi_tresor.pdf
Size:: 2.14 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Masters