Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)

Master Thesis

2019

Permanent link to this Item
Authors
Supervisors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher
License
Series
Abstract
The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity.
Description

Reference:

Collections