Return to search

Assessing and quantifying clusteredness: The OPTICS Cordillera

This article provides a framework for assessing and quantifying "clusteredness" of a data representation.
Clusteredness is a global univariate property defined as a layout diverging from equidistance of points
to the closest neighboring point set. The OPTICS algorithm encodes the global clusteredness as a pair of
clusteredness-representative distances and an algorithmic ordering. We use this to construct an index for
quantification of clusteredness, coined the OPTICS Cordillera, as the norm of subsequent differences over
the pair. We provide lower and upper bounds and a normalization for the index. We show the index captures
important aspects of clusteredness such as cluster compactness, cluster separation, and number of
clusters simultaneously. The index can be used as a goodness-of-clusteredness statistic, as a function over
a grid or to compare different representations. For illustration, we apply our suggestion to dimensionality
reduced 2D representations of Californian counties with respect to 48 climate change related variables.
Online supplementary material is available (including an R package, the data and additional mathematical
details).

Identiferoai:union.ndltd.org:VIENNA/oai:epub.wu-wien.ac.at:5725
Date22 June 2018
CreatorsRusch, Thomas, Hornik, Kurt, Mair, Patrick
PublisherTaylor & Francis
Source SetsWirtschaftsuniversität Wien
LanguageEnglish
Detected LanguageEnglish
TypeArticle, PeerReviewed
Formatapplication/pdf
RightsCreative Commons: Attribution 3.0 Austria
Relationhttp://dx.doi.org/10.1080/10618600.2017.1349664, http://www.tandfonline.com, http://epub.wu.ac.at/5725/

Page generated in 0.0021 seconds