Biplots based on principal surfaces

Ganey, Raeesa

Biplots based on principal surfaces

Doctoral Thesis

2019

Abstract

Principal surfaces are smooth two-dimensional surfaces that pass through the middle of a p-dimensional data set. They minimise the distance from the data points, and provide a nonlinear summary of the data. The surfaces are nonparametric and their shape is suggested by the data. The formation of a surface is found using an iterative procedure which starts with a linear summary, typically with a principal component plane. Each successive iteration is a local average of the p-dimensional points, where an average is based on a projection of a point onto the nonlinear surface of the previous iteration. Biplots are considered as extensions of the ordinary scatterplot by providing for more than three variables. When the difference between data points are measured using a Euclidean embeddable dissimilarity function, observations and the associated variables can be displayed on a nonlinear biplot. A nonlinear biplot is predictive if information on variables is added in such a way that it allows the values of the variables to be estimated for points in the biplot. Prediction trajectories, which tend to be nonlinear are created on the biplot to allow information about variables to be estimated. The goal is to extend the idea of nonlinear biplot methodology onto principal surfaces. The ultimate emphasis is on high dimensional data where the nonlinear biplot based on a principal surface allows for visualisation of samples, variable trajectories and predictive sets of contour lines. The proposed biplot provides more accurate predictions, with an additional feature of visualising the extent of nonlinearity that exists in the data.