Global ETD Search

Return to search

Machine Learning on Statistical Manifold

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k = 2 and k = 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k = 3 clusters, and the discrete Poisson distributions with k = 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.

Applied Statistics

Statistical Methodology

Identifer	oai:union.ndltd.org:CLAREMONT/oai:scholarship.claremont.edu:hmc_theses-1095
Date	01 January 2017
Creators	Zhang, Bo
Publisher	Scholarship @ Claremont
Source Sets	Claremont Colleges
Detected Language	English
Type	text
Format	application/pdf
Source	HMC Senior Theses
Rights	© 2017 Bo Zhang

Page generated in 0.0017 seconds

Machine Learning on Statistical Manifold

Description

Links & Downloads

Tags

Additional Fields