Semi-automatic matching of semi-structured data updates

dc.contributor.advisorBerman, Soniaen_ZA
dc.contributor.authorForshaw,Gareth Williamen_ZA
dc.date.accessioned2015-05-27T04:11:15Z
dc.date.available2015-05-27T04:11:15Z
dc.date.issued2014en_ZA
dc.descriptionIncludes bibliographical references.en_ZA
dc.description.abstractData matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process.en_ZA
dc.identifier.apacitation (2014). <i>Semi-automatic matching of semi-structured data updates</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/12930en_ZA
dc.identifier.chicagocitation. <i>"Semi-automatic matching of semi-structured data updates."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2014. http://hdl.handle.net/11427/12930en_ZA
dc.identifier.citation 2014. Semi-automatic matching of semi-structured data updates. Thesis. University of Cape Town ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/12930en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Forshaw,Gareth William AB - Data matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process. DA - 2014 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2014 T1 - Semi-automatic matching of semi-structured data updates TI - Semi-automatic matching of semi-structured data updates UR - http://hdl.handle.net/11427/12930 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/12930
dc.identifier.vancouvercitation. Semi-automatic matching of semi-structured data updates. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2014 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/12930en_ZA
dc.language.isoengen_ZA
dc.publisher.departmentDepartment of Computer Scienceen_ZA
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.subject.otherInformation Technologyen_ZA
dc.titleSemi-automatic matching of semi-structured data updatesen_ZA
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMScen_ZA
uct.type.filetypeText
uct.type.filetypeImage
uct.type.publicationResearchen_ZA
uct.type.resourceThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2014_forshaw_gw.pdf
Size:
1.39 MB
Format:
Adobe Portable Document Format
Description:
Collections