Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.
Identifer | oai:union.ndltd.org:CLAREMONT/oai:scholarship.claremont.edu:scripps_theses-1173 |
Date | 01 January 2013 |
Creators | Mayer-Jochimsen, Morgan |
Publisher | Scholarship @ Claremont |
Source Sets | Claremont Colleges |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Scripps Senior Theses |
Rights | © 2013 Morgan Mayer-Jochimsen, default |
Page generated in 0.0018 seconds