Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data

dc.contributor.advisorBritz, Stefan
dc.contributor.authorPerumal, Yevashan
dc.date.accessioned2024-06-19T07:22:12Z
dc.date.available2024-06-19T07:22:12Z
dc.date.issued2023
dc.date.updated2024-06-06T14:24:12Z
dc.description.abstractInsurance underwriting can be time-consuming and costly for both insurers and customers. However, the insight gained is of critical importance in addressing the information asymmetry between insurers and customers in terms of establishing a customer's risk profile. Consequently, any test that assists in providing a risk assessment is critical in allowing insurance companies to manage risk and price their products appropriately. Gamma-glutamyl Transferase (GGT) is an enzyme which has been used by insurers in underwriting medical tests as an indicator of potential adverse outcomes. However, due to complexities such as differing underwriting strategies, data collection and data storage issues, not every customer on an insurer's books will have a GGT value or even a complete data profile. This research investigates if statistical techniques such as imputation and supervised learning can be used in conjunction with available medical, demographic, underwriting and policy data to accurately predict GGT values. A combination of multivariate imputation by chained equations (MICE) and extremegradient boosted trees (XGBoost) offers a 31% improvement in accuracy compared to a naïve prediction. However, there does appear to be a limit to the performance achieved from all implemented techniques with the analysed dataset, with various model combinations yielding root mean squared error (RMSE) values within a narrow range. In addition, when comparing the predictions from a separate, unlabelled dataset to actual data, it appears as though predictions from the models cannot be reliably deemed to be from the same distribution. This indicates that further research is required before insurers can reliably switch out blood-work based GGT results for those from a supervised learning model. Keywords: insurance, underwriting, gamma-glutamyl transferase, imputation, supervised learning
dc.identifier.apacitationPerumal, Y. (2023). <i>Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/39916en_ZA
dc.identifier.chicagocitationPerumal, Yevashan. <i>"Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2023. http://hdl.handle.net/11427/39916en_ZA
dc.identifier.citationPerumal, Y. 2023. Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/39916en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Perumal, Yevashan AB - Insurance underwriting can be time-consuming and costly for both insurers and customers. However, the insight gained is of critical importance in addressing the information asymmetry between insurers and customers in terms of establishing a customer's risk profile. Consequently, any test that assists in providing a risk assessment is critical in allowing insurance companies to manage risk and price their products appropriately. Gamma-glutamyl Transferase (GGT) is an enzyme which has been used by insurers in underwriting medical tests as an indicator of potential adverse outcomes. However, due to complexities such as differing underwriting strategies, data collection and data storage issues, not every customer on an insurer's books will have a GGT value or even a complete data profile. This research investigates if statistical techniques such as imputation and supervised learning can be used in conjunction with available medical, demographic, underwriting and policy data to accurately predict GGT values. A combination of multivariate imputation by chained equations (MICE) and extremegradient boosted trees (XGBoost) offers a 31% improvement in accuracy compared to a naïve prediction. However, there does appear to be a limit to the performance achieved from all implemented techniques with the analysed dataset, with various model combinations yielding root mean squared error (RMSE) values within a narrow range. In addition, when comparing the predictions from a separate, unlabelled dataset to actual data, it appears as though predictions from the models cannot be reliably deemed to be from the same distribution. This indicates that further research is required before insurers can reliably switch out blood-work based GGT results for those from a supervised learning model. Keywords: insurance, underwriting, gamma-glutamyl transferase, imputation, supervised learning DA - 2023 DB - OpenUCT DP - University of Cape Town KW - Statistical Sciences LK - https://open.uct.ac.za PY - 2023 T1 - Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data TI - Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data UR - http://hdl.handle.net/11427/39916 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/39916
dc.identifier.vancouvercitationPerumal Y. Applying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data. []. ,Faculty of Science ,Department of Statistical Sciences, 2023 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/39916en_ZA
dc.language.rfc3066Eng
dc.publisher.departmentDepartment of Statistical Sciences
dc.publisher.facultyFaculty of Science
dc.subjectStatistical Sciences
dc.titleApplying imputation and statistical learning to predict gamma-glutamyl transferase in underwriting data
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2023_perumal yevashan.pdf
Size:
1.8 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections