Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)

dc.contributor.advisorGeorg, Co-Pierre
dc.contributor.authorFavish, Ashleigh
dc.date.accessioned2020-02-28T11:46:12Z
dc.date.available2020-02-28T11:46:12Z
dc.date.issued2019
dc.date.updated2020-02-28T11:09:38Z
dc.description.abstractThe impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity.
dc.identifier.apacitationFavish, A. (2019). <i>Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)</i>. (). ,Faculty of Commerce ,African Institute of Financial Markets and Risk Management. Retrieved from http://hdl.handle.net/11427/31389en_ZA
dc.identifier.chicagocitationFavish, Ashleigh. <i>"Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)."</i> ., ,Faculty of Commerce ,African Institute of Financial Markets and Risk Management, 2019. http://hdl.handle.net/11427/31389en_ZA
dc.identifier.citationFavish, A. 2019. Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR). . ,Faculty of Commerce ,African Institute of Financial Markets and Risk Management. http://hdl.handle.net/11427/31389en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Favish, Ashleigh AB - The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity. DA - 2019 DB - OpenUCT DP - University of Cape Town KW - Financial Technology LK - https://open.uct.ac.za PY - 2019 T1 - Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) TI - Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR) UR - http://hdl.handle.net/11427/31389 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/31389
dc.identifier.vancouvercitationFavish A. Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR). []. ,Faculty of Commerce ,African Institute of Financial Markets and Risk Management, 2019 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/31389en_ZA
dc.language.rfc3066eng
dc.publisher.departmentAfrican Institute of Financial Markets and Risk Management
dc.publisher.facultyFaculty of Commerce
dc.subjectFinancial Technology
dc.titleData Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMPhil
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_com_2019_favish_ashleigh.pdf
Size:
1.55 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections