Statistical model selection techniques for the cox proportional hazards model: a comparative study

dc.contributor.advisorGumedze, Freedom
dc.contributor.authorNjati, Jolando
dc.date.accessioned2022-07-01T15:26:47Z
dc.date.available2022-07-01T15:26:47Z
dc.date.issued2022
dc.date.updated2022-07-01T15:24:00Z
dc.description.abstractThe advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers.
dc.identifier.apacitationNjati, J. (2022). <i>Statistical model selection techniques for the cox proportional hazards model: a comparative study</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/36594en_ZA
dc.identifier.chicagocitationNjati, Jolando. <i>"Statistical model selection techniques for the cox proportional hazards model: a comparative study."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2022. http://hdl.handle.net/11427/36594en_ZA
dc.identifier.citationNjati, J. 2022. Statistical model selection techniques for the cox proportional hazards model: a comparative study. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/36594en_ZA
dc.identifier.ris TY - Master Thesis AU - Njati, Jolando AB - The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers. DA - 2022 DB - OpenUCT DP - University of Cape Town KW - survival analysis KW - simulation KW - Cox proportional hazard model selection KW - integrated area under the curve LK - https://open.uct.ac.za PY - 2022 T1 - Statistical model selection techniques for the cox proportional hazards model: a comparative study TI - Statistical model selection techniques for the cox proportional hazards model: a comparative study UR - http://hdl.handle.net/11427/36594 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/36594
dc.identifier.vancouvercitationNjati J. Statistical model selection techniques for the cox proportional hazards model: a comparative study. []. ,Faculty of Science ,Department of Statistical Sciences, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/36594en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Statistical Sciences
dc.publisher.facultyFaculty of Science
dc.subjectsurvival analysis
dc.subjectsimulation
dc.subjectCox proportional hazard model selection
dc.subjectintegrated area under the curve
dc.titleStatistical model selection techniques for the cox proportional hazards model: a comparative study
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2022_njati jolando.pdf
Size:
1.56 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections