Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa

Masconi, Katya L; Matsha, Tandi E; Erasmus, Rajiv T; Kengne, Andre P

Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa

dc.contributor.author	Masconi, Katya L	en_ZA
dc.contributor.author	Matsha, Tandi E	en_ZA
dc.contributor.author	Erasmus, Rajiv T	en_ZA
dc.contributor.author	Kengne, Andre P	en_ZA
dc.date.accessioned	2015-11-18T07:11:44Z
dc.date.available	2015-11-18T07:11:44Z
dc.date.issued	2015	en_ZA
dc.description.abstract	BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. METHODS: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. RESULTS: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. CONCLUSIONS: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.	en_ZA
dc.identifier.apacitation	Masconi, K. L., Matsha, T. E., Erasmus, R. T., & Kengne, A. P. (2015). Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. <i>PLoS One</i>, http://hdl.handle.net/11427/15142	en_ZA
dc.identifier.chicagocitation	Masconi, Katya L, Tandi E Matsha, Rajiv T Erasmus, and Andre P Kengne "Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa." <i>PLoS One</i> (2015) http://hdl.handle.net/11427/15142	en_ZA
dc.identifier.citation	Masconi, K. L., Matsha, T. E., Erasmus, R. T., & Kengne, A. P. (2015). Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. PloS one, 10(9), e0139210. doi:10.1371/journal.pone.0139210	en_ZA
dc.identifier.ris	TY - Journal Article AU - Masconi, Katya L AU - Matsha, Tandi E AU - Erasmus, Rajiv T AU - Kengne, Andre P AB - BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. METHODS: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. RESULTS: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. CONCLUSIONS: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation. DA - 2015 DB - OpenUCT DO - 10.1371/journal.pone.0139210 DP - University of Cape Town J1 - PLoS One LK - https://open.uct.ac.za PB - University of Cape Town PY - 2015 T1 - Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa TI - Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa UR - http://hdl.handle.net/11427/15142 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/15142
dc.identifier.uri	http://dx.doi.org/10.1371/journal.pone.0139210
dc.identifier.vancouvercitation	Masconi KL, Matsha TE, Erasmus RT, Kengne AP. Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. PLoS One. 2015; http://hdl.handle.net/11427/15142.	en_ZA
dc.language.iso	eng	en_ZA
dc.publisher	Public Library of Science	en_ZA
dc.publisher.department	Department of Medicine	en_ZA
dc.publisher.faculty	Faculty of Health Sciences	en_ZA
dc.publisher.institution	University of Cape Town
dc.rights	This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.	en_ZA
dc.rights.holder	© 2015 Masconi et al	en_ZA
dc.rights.uri	http://creativecommons.org/licenses/by/4.0	en_ZA
dc.source	PLoS One	en_ZA
dc.source.uri	http://journals.plos.org/plosone	en_ZA
dc.subject.other	Diabetes mellitus	en_ZA
dc.subject.other	Forecasting	en_ZA
dc.subject.other	Blood pressure	en_ZA
dc.subject.other	Database and informatics methods	en_ZA
dc.subject.other	Body mass index	en_ZA
dc.subject.other	Hypertension	en_ZA
dc.subject.other	Parenting behavior	en_ZA
dc.subject.other	South Africa	en_ZA
dc.title	Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa	en_ZA
dc.type	Journal Article	en_ZA
uct.type.filetype	Text
uct.type.filetype	Image
uct.type.publication	Research	en_ZA
uct.type.resource	Article	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Masconi_Effects_Missing_Data_2015.pdf
Size:: 674.68 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Journal Articles