A simple method for visualizing labelled and unlabelled data in high-dimensional spaces

Greene, J R

A simple method for visualizing labelled and unlabelled data in high-dimensional spaces

Other

2004

Publisher

University of Cape Town

Department

Department of Electrical Engineering

Faculty

Faculty of Engineering and the Built Environment

Abstract

The low-dimensional visualisation of highdimensional data is a valuable way of detecting structure (such as clusters, and the presence of outliers) in the data, and avoiding some of the pitfalls of blind data manipulation. Projection based on principal component analysis is widely employed and often useful, but it is a variancepreserving projection which takes no account of class labels, and may, for this reason, hide significant structure. Here we present a very simple method which appears to yield useful visualizations for many datasets. It is based on a random search for a linear transformation, and projection into a twodimensional visual space, which maximises an objective measure of class separability in the visual space. The method, which can be thought of as a variant of projection pursuit with a novel interest measure, is demonstrated on datasets from the UCI Repository. Tentative interim results are also given for a proposed extension based on spectral clustering, for extending the method to unlabelled data.

Reference:

Collections

Other / General

Full item page