Clustering data by identifying a subset of representative examples is important for detecting patterns in data and in processing sensory signals. Such "exemplars" can be found by randomly choosing an initial subset of data points as exemplars and then iteratively refining it, but this works well only if that initial choice is close to a good solution. This thesis describes a method called "affinity propagation" that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges.
Affinity propagation takes as input a set of pairwise similarities between data points and finds clusters on the basis of maximizing the total similarity between data points and their exemplars. Similarity can be simply defined as negative squared Euclidean distance for compatibility with other algorithms, or it can incorporate richer domain-specific models (e.g., translation-invariant distances for comparing images). Affinity propagation’s computational and memory requirements scale linearly with the number of similarities input; for non-sparse problems where all possible similarities are computed, these requirements scale quadratically with the number of data points. Affinity propagation is demonstrated on several applications from areas such as computer vision and bioinformatics, and it typically finds better clustering solutions than other methods in less time.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/17755 |
Date | 24 September 2009 |
Creators | Dueck, Delbert |
Contributors | Frey, Brendan J. |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0023 seconds