ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice

dc.contributor.advisorPienaar, Etienne
dc.contributor.authorEvert, Francois
dc.date.accessioned2023-03-13T12:31:14Z
dc.date.available2023-03-13T12:31:14Z
dc.date.issued2022
dc.date.updated2023-02-20T12:44:14Z
dc.description.abstractThe abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics.
dc.identifier.apacitationEvert, F. (2022). <i>ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/37403en_ZA
dc.identifier.chicagocitationEvert, Francois. <i>"ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2022. http://hdl.handle.net/11427/37403en_ZA
dc.identifier.citationEvert, F. 2022. ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/37403en_ZA
dc.identifier.ris TY - Master Thesis AU - Evert, Francois AB - The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics. DA - 2022_ DB - OpenUCT DP - University of Cape Town KW - Statistical Sciences LK - https://open.uct.ac.za PY - 2022 T1 - ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice TI - ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice UR - http://hdl.handle.net/11427/37403 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/37403
dc.identifier.vancouvercitationEvert F. ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice. []. ,Faculty of Science ,Department of Statistical Sciences, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/37403en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Statistical Sciences
dc.publisher.facultyFaculty of Science
dc.subjectStatistical Sciences
dc.titleISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2022_evert francois.pdf
Size:
10.72 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections