Return to search

Clustering Methods and Their Applications to Adolescent Healthcare Data

Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.

Identiferoai:union.ndltd.org:CLAREMONT/oai:http://scholarship.claremont.edu/do/oai/:scripps_theses-1173
Date01 April 2013
CreatorsMayer-Jochimsen, Morgan
PublisherScholarship @ Claremont
Source SetsClaremont Colleges
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceScripps Senior Theses
Rights© 2013 Morgan Mayer-Jochimsen

Page generated in 0.0017 seconds