The main topic of this thesis concerns t-SNE, a dimensionality reduction technique that has gained much popularity for showing great capability of preserving well-separated clusters from a high-dimensional space. Our goal with this thesis is twofold. Firstly we give an introduction to the use of dimensionality reduction techniques in visualization and, following recent research, show that t-SNE in particular is successful at preserving well-separated clusters. Secondly, we perform a thorough series of experiments that give us the ability to draw conclusions about the quality of embeddings from running t-SNE on samples of data using different sampling techniques. We are comparing pure random sampling, random walk sampling and so-called hubness sampling on a dataset, attempting to find a sampling method that is consistently better at preserving local information than simple random sampling. Throughout our testing, a specific variant of random walk sampling distinguished itself as a better alternative to pure random sampling.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-88585 |
Date | January 2019 |
Creators | Buljan, Matej |
Publisher | Linnéuniversitetet, Institutionen för matematik (MA) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0017 seconds