Outcome selection in longitudinal analysis of immunological data

Holcroft, Shannon

Outcome selection in longitudinal analysis of immunological data

Thesis / Dissertation

2025

Publisher

University of Cape Town

Department

Department of Statistical Sciences

Faculty

Faculty of Science

Abstract

Immunological research often compares subgroups defined by exposure variables known (or hypothesised) to influence continuous immune responses. Many immune outcomes are measured over time, often in a small number of patients. Effective outcome selection ensures that research focuses on immune outcomes with the strongest signals for subgroup differences. This dissertation explores an outcome selection technique for longitudinal immunological data, addressing current methodological limitations and proposing improvements. The approach integrates statistical modelling with dimension reduction to identify immune outcomes with the most evidence for subgroup differences. By focusing on these subsets, fewer statistical hypotheses are tested simultaneously, preserving power when stricter significance thresholds are applied to reduce type-I error inflation. The dissertation examines the suitability of different longitudinal modelling frameworks. Generalised linear mixed-effects models are better suited to the characteristics of immunological data and research than linear mixed-effects models. Two dimension reduction techniques are compared: principal component analysis (PCA) and hierarchical cluster analysis (HCA) followed by PCA. PCA identifies the largest sources of variance across all outcomes, while HCA followed by PCA identifies variance within groups of similar outcomes. These techniques influence the definition of families of tests for false discovery rate (FDR) corrections. When outcomes are selected via PCA-only dimension reduction, more tests are performed simultaneously and require correction. It was hypothesised that HCA followed by PCA would yield more significant discoveries after FDR control. However, fewer simultaneous comparisons did not reliably correspond with more statistically significant discoveries. The methodology was applied to a dataset from the South African Tuberculosis Vaccine Initiative (SATVI), focusing on 33 immune outcomes and three exposures: MVA85A priming, maternal Mycobacterium tuberculosis sensitisation (measured by a positive QuantiFERONTB Gold test), and combinations of feeding practices and cotrimoxazole treatment. The analysis shows that different dimension reduction techniques lead to different outcome selections and families of tests, emphasising the need to align analysis objectives with outcome selection techniques. This dissertation contributes to outcome selection methodology in high-dimensional, longitudinal settings, with broader applications in biomedical research.

Keywords

principal component analysis

PCA

Reference:

Collections

Masters

Full item page