Self-organizing maps (SOMs) are self-organized projections of high dimensional data onto a low, typically two dimensional (2D), map wherein vector similarity is implicitly translated into topological closeness in the 2D projection. They are thus used for clustering and visualization of high dimensional data. However it is often challenging to interpret the results due to drawbacks of currently used methods for identifying and visualizing cluster boundaries in the resulting feature maps. In this thesis we introduce a new phase to the SOM that we refer to as the Cluster Reinforcement (CR) phase. The CR phase amplifies within-cluster similarity with the consequence that cluster boundaries become much more evident. We also define a new Boundary (B) matrix that makes cluster boundaries easy to visualize, can be thresholded at various levels to make cluster hierarchies apparent, and can be overlain directly onto maps of component planes (something that was not possible with previous methods). The combination of the SOM, CR phase and B-matrix comprise an automated method for improved identification and informative visualization of clusters in high dimensional data. We demonstrate these methods on three data sets: the classic 13- dimensional binary-valued “animal” benchmark test, actual 60-dimensional binaryvalued phonetic word clustering problem, and 3-dimensional real-valued geographic data clustering related to fuel efficiency of vehicle choice.
Identifer | oai:union.ndltd.org:uvm.edu/oai:scholarworks.uvm.edu:graddis-1146 |
Date | 18 July 2011 |
Creators | Manukyan, Narine |
Publisher | ScholarWorks @ UVM |
Source Sets | University of Vermont |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Graduate College Dissertations and Theses |
Page generated in 0.0021 seconds