Aspects of Bayesian inference, classification and anomaly detection

Doctoral Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
The primary objective of this thesis is to develop rigorous Bayesian tools for common statistical challenges arising in modern science where there is a heightened demand for precise inference in the presence of large, known uncertainties. This thesis explores in detail two arenas where this manifests. The first is the development and testing of a unified Bayesian anomaly detection and classification framework (BADAC) which allows principled anomaly detection in the presence of measurement uncertainties, which are rarely incorporated into machine learning algorithms. BADAC deals with uncertainties by marginalising over the unknown, true value of the data. Using simulated data with Gaussian noise as an example, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. BADAC is therefore ideal where computational cost is not a limiting factor and statistical rigour is important. We discuss approximations to speed up BADAC, such as the use of Gaussian processes, and finally introduce a new metric, the Rank-Weighted Score (RWS), that is particularly suited to evaluating an algorithm's ability to detect anomalies. The second major exploration in this thesis presents methods for rigorous statistical inference in the presence of classification uncertainties and errors. Although this is explored specifically through supernova cosmology, the context is general. Supernova cosmology without spectra will be an important component of future surveys due to massive increases in data volumes in next-generation surveys such as from the Vera C. Rubin Observatory. This lack of supernova spectra results both in uncertainty in the redshifts and type of the supernova, which if ignored, leads to significantly biased estimates of cosmological parameters. We present a hierarchical Bayesian formalism, zBEAMS, which addresses this problem by marginalising over the unknown or uncertain supernova redshifts and types to produce unbiased cosmological estimates that are competitive with supernova data with fully spectroscopically confirmed redshifts. zBEAMS thus provides a unified treatment of both photometric redshifts, classification uncertainty and host galaxy misidentification, effectively correcting the inevitable contamination in the Hubble diagram with little or no loss of statistical power.