Ancestry-independent osteometric sex estimation from selected postcranial skeletal elements of South Africans: a machine learning approach

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
Sex estimation, as part of a biological profile, has the power to halve the number of possible identities of unidentified skeletal remains. Postcranial elements have been studied in South Africa (SA) for the purpose of sex estimation and have often proven to be more accurate than the cranium. Estimation techniques using postcranial elements in SA almost exclusively utilise discriminant analysis to evaluate sex, but international publications have shown success using alternative machine learning (ML) algorithms. SA methods and standards are often restricted by limited sample size, lack of robust statistical techniques in older publications and, the prerequisite of known or estimated ancestry. Most methods are specific to SA African, European or, more recently, Mixed ancestry groups and are unreliable when ancestry is unknown. The aim of this study was to apply a series of ML algorithms to train ancestry independent sex classification models using postcranial osteometric measurements from the cadaveric skeletal remains of modern South Africans, focussing on long bone joints. The study consisted of a roughly demographically representative, pooled sample, of 650 South Africans (325 male, 325 female). 12 osteometric measurements were taken from available left- and, or right-sided bones for each individual. All 12 mensurations were sexually dimorphic and differences between left- and right-sided bones were negligible. The dataset was subjected to ML algorithm training using univariate and multivariate predictor combinations. The best performing ML algorithm, given the sample size and available predictors was discriminant function analysis. Univariate model accuracies ranged from 80.5-89.1% and multivariate model accuracies ranged from 84.5%, using 2 predictors, to 92.8%, using 12 predictors. An optimised 3-predictor model was able to predict sex with 92.7% accuracy. Results from this study were comparable to those using ancestry-specific models and non-ancestry-specific models, where available. Findings from this study suggest that the inclusion of ancestry, when predicting sex using the elements examined, is not necessary as it does not significantly improve prediction accuracy.