With the growth of geo-referenced data and the sophistication and complexity of spatial databases, data mining and knowledge discovery techniques become essential tools for successful analysis of large spatial datasets. Spatial clustering is fundamental and central to geographical data mining. It partitions a dataset into smaller homogeneous groups due to spatial proximity. Resulting groups represent geographically interesting patterns of concentrations for which further investigations should be undertaken to find possible causal factors. In this thesis, we propose a spatial-dominant generalization approach that mines multivariate causal associations among geographical data layers using clustering analysis. First, we propose a generic framework of multi-purpose exploratory spatial clustering in the form of the Template-Method Pattern. Based on an object-oriented framework, we design and implement an automatic multi-purpose exploratory spatial clustering tool. The first instance of this framework uses the Delaunay diagram as an underlying proximity graph. Our spatial clustering incorporates the peculiar characteristics of spatial data that make space special. Thus, our method is able to identify high-quality spatial clusters including clusters of arbitrary shapes, clusters of heterogeneous densities, clusters of different sizes, closely located high-density clusters, clusters connected by multiple chains, sparse clusters near to high-density clusters and clusters containing clusters within O(n log n) time. It derives values for parameters from data and thus maximizes user-friendliness. Therefore, our approach minimizes user-oriented bias and constraints that hinder exploratory data analysis and geographical data mining. Sheer volume of spatial data stored in spatial databases is not the only concern. The heterogeneity of datasets is a common issue in data-rich environments, but left open by exploratory tools. Our spatial clustering extends to the Minkowski metric in the absence or presence of obstacles to deal with situations where interactions between spatial objects are not adequately modeled by the Euclidean distance. The genericity is such that our clustering methodology extends to various spatial proximity graphs beyond the default Delaunay diagram. We also investigate an extension of our clustering to higher-dimensional datasets that robustly identify higher-dimensional clusters within O(n log n) time. The versatility of our clustering is further illustrated with its deployment to multi-level clustering. We develop a multi-level clustering method that reveals hierarchical structures hidden in complex datasets within O(n log n) time. We also introduce weighted dendrograms to effectively visualize the cluster hierarchies. Interpretability and usability of clustering results are of great importance. We propose an automatic pattern spotter that reveals high level description of clusters. We develop an effective and efficient cluster polygonization process towards mining causal associations. It automatically approximates shapes of clusters and robustly reveals asymmetric causal associations among data layers. Since it does not require domain-specific concept hierarchies, its applicability is enhanced. / PhD Doctorate
Identifer | oai:union.ndltd.org:ADTP/189498 |
Date | January 2002 |
Creators | Lee, Ickjai Lee |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://www.newcastle.edu.au/copyright.html, Copyright 2002 Ickjai Lee Lee |
Page generated in 0.0014 seconds