Return to search

Machine Learning on Statistical Manifold

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k = 2 and k = 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k = 3 clusters, and the discrete Poisson distributions with k = 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.

Identiferoai:union.ndltd.org:CLAREMONT/oai:scholarship.claremont.edu:hmc_theses-1095
Date01 January 2017
CreatorsZhang, Bo
PublisherScholarship @ Claremont
Source SetsClaremont Colleges
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceHMC Senior Theses
Rights© 2017 Bo Zhang

Page generated in 0.0019 seconds