Global ETD Search

1	DISCOVERY OF CLUSTERS IN SPATIAL DATABASES BATRA, SHALINI January 2003 (has links) No description available. Computer Science spatial data mining clustering quad-tree clusters
2	FINDING CLUSTERS IN SPATIAL DATA SHENCOTTAH K.N., KALYANKUMAR 03 July 2007 (has links) No description available. Computer Science clusters Spatial data mining Quad-Tree Spatial clustering
3	Multi-Purpose Boundary-Based Clustering on Proximity Graphs for Geographical Data Mining Lee, Ickjai Lee January 2002 (has links) With the growth of geo-referenced data and the sophistication and complexity of spatial databases, data mining and knowledge discovery techniques become essential tools for successful analysis of large spatial datasets. Spatial clustering is fundamental and central to geographical data mining. It partitions a dataset into smaller homogeneous groups due to spatial proximity. Resulting groups represent geographically interesting patterns of concentrations for which further investigations should be undertaken to find possible causal factors. In this thesis, we propose a spatial-dominant generalization approach that mines multivariate causal associations among geographical data layers using clustering analysis. First, we propose a generic framework of multi-purpose exploratory spatial clustering in the form of the Template-Method Pattern. Based on an object-oriented framework, we design and implement an automatic multi-purpose exploratory spatial clustering tool. The first instance of this framework uses the Delaunay diagram as an underlying proximity graph. Our spatial clustering incorporates the peculiar characteristics of spatial data that make space special. Thus, our method is able to identify high-quality spatial clusters including clusters of arbitrary shapes, clusters of heterogeneous densities, clusters of different sizes, closely located high-density clusters, clusters connected by multiple chains, sparse clusters near to high-density clusters and clusters containing clusters within O(n log n) time. It derives values for parameters from data and thus maximizes user-friendliness. Therefore, our approach minimizes user-oriented bias and constraints that hinder exploratory data analysis and geographical data mining. Sheer volume of spatial data stored in spatial databases is not the only concern. The heterogeneity of datasets is a common issue in data-rich environments, but left open by exploratory tools. Our spatial clustering extends to the Minkowski metric in the absence or presence of obstacles to deal with situations where interactions between spatial objects are not adequately modeled by the Euclidean distance. The genericity is such that our clustering methodology extends to various spatial proximity graphs beyond the default Delaunay diagram. We also investigate an extension of our clustering to higher-dimensional datasets that robustly identify higher-dimensional clusters within O(n log n) time. The versatility of our clustering is further illustrated with its deployment to multi-level clustering. We develop a multi-level clustering method that reveals hierarchical structures hidden in complex datasets within O(n log n) time. We also introduce weighted dendrograms to effectively visualize the cluster hierarchies. Interpretability and usability of clustering results are of great importance. We propose an automatic pattern spotter that reveals high level description of clusters. We develop an effective and efficient cluster polygonization process towards mining causal associations. It automatically approximates shapes of clusters and robustly reveals asymmetric causal associations among data layers. Since it does not require domain-specific concept hierarchies, its applicability is enhanced. / PhD Doctorate Geographical data mining clustering spatial data mining association rules mining post-clustering
4	An Efficient Hilbert Curve-based Clustering Strategy for Large Spatial Databases Lu, Yun-Tai 25 July 2003 (has links) Recently, millions of databases have been used and we need a new technique that can automatically transform the processed data into useful information and knowledge. Data mining is the technique of analyzing data to discover previously unknown information and spatial data mining is the branch of data mining that deals with spatial data. In spatial data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering is that give n data points in a d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on finding efficient methods to the clustering problem. In general, we can classify these clustering algorithms into four approaches: partition, hierarchical, density-based, and grid-based approaches. The k-means algorithm which is based on the partitioning approach is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent ``natural' cluster, and it is only suitable for concave spherical clusters. The k-means algorithm has high computational complexity and is unable to handle large databases. Therefore, in this thesis, we present an efficient clustering algorithm for large spatial databases. It combines the hierarchical approach with the grid-based approach structure. We apply the grid-based approach, because it is efficient for large spatial databases. Moreover, we apply the hierarchical approach to find the genuine clusters by repeatedly combining together these blocks. Basically, we make use of the Hilbert curve to provide a way to linearly order the points of a grid. Note that the Hilbert curve is a kind of space-filling curves, where a space-filling curve is a continuous path which passes through every point in a space once to form a one-one correspondence between the coordinates of the points and the one-dimensional sequence numbers of the points on the curve. The goal of using space-filling curve is to preserve the distance that points which are close in 2-D space and represent similar data should be stored close together in the linear order. This kind of mapping also can minimize the disk access effort and provide high speed for clustering. This new algorithm requires only one input parameter and supports the user in determining an appropriate value for it. In our simulation, we have shown that our proposed clustering algorithm can have shorter execution time than other algorithms for the large databases. Since the number of data points is increased, the execution time of our algorithm is increased slowly. Moreover, our algorithm can deal with clusters with arbitrary shapes in which the k-means algorithm can not discover. hierarchical clustering grid-based clustering clustering space-filling curve spatial data mining
5	A Framework for Participatory Sensing Systems Mendez Chaves, Diego 01 January 2012 (has links) Participatory sensing (PS) systems are a new emerging sensing paradigm based on the participation of cellular users in a cooperative way. Due to the spatio-temporal granularity that a PS system can provide, it is now possible to detect and analyze events that occur at different scales, at a low cost. While PS systems present interesting characteristics, they also create new problems. Since the measuring devices are cheaper and they are in the hands of the users, PS systems face several design challenges related to the poor accuracy and high failure rate of the sensors, the possibility of malicious users tampering the data, the violation of the privacy of the users as well as methods to encourage the participation of the users, and the effective visualization of the data. This dissertation presents four main contributions in order to solve some of these challenges. This dissertation presents a framework to guide the design and implementation of PS applications considering all these aspects. The framework consists of five modules: sample size determination, data collection, data verification, data visualization, and density maps generation modules. The remaining contributions are mapped one-on-one to three of the modules of this framework: data verification, data visualization and density maps. Data verification, in the context of PS, consists of the process of detecting and removing spatial outliers to properly reconstruct the variables of interest. A new algorithm for spatial outliers detection and removal is proposed, implemented, and tested. This hybrid neighborhood-aware algorithm considers the uneven spatial density of the users, the number of malicious users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors. The experimental results show that the proposed algorithm performs as good as the best estimator while reducing the execution time considerably. The problem of data visualization in the context of PS application is also of special interest. The characteristics of a typical PS application imply the generation of multivariate time-space series with many gaps in time and space. Considering this, a new method is presented based on the kriging technique along with Principal Component Analysis and Independent Component Analysis. Additionally, a new technique to interpolate data in time and space is proposed, which is more appropriate for PS systems. The results indicate that the accuracy of the estimates improves with the amount of data, i.e., one variable, multiple variables, and space and time data. Also, the results clearly show the advantage of a PS system compared with a traditional measuring system in terms of the precision and spatial resolution of the information provided to the users. One key challenge in PS systems is that of the determination of the locations and number of users where to obtain samples from so that the variables of interest can be accurately represented with a low number of participants. To address this challenge, the use of density maps is proposed, a technique that is based on the current estimations of the variable. The density maps are then utilized by the incentive mechanism in order to encourage the participation of those users indicated in the map. The experimental results show how the density maps greatly improve the quality of the estimations while maintaining a stable and low total number of users in the system. P-Sense, a PS system to monitor pollution levels, has been implemented and tested, and is used as a validation example for all the contributions presented here. P-Sense integrates gas and environmental sensors with a cell phone, in order to monitor air quality levels. ICA Kriging PCA Spatial data-mining Spatial interpolation Spatial outliers American Studies Arts and Humanities Computer Sciences
6	SPATIAL-TEMPORAL DATA ANALYTICS AND CONSUMER SHOPPING BEHAVIOR MODELING Yan, Ping January 2010 (has links) RFID technologies are being recently adopted in the retail space tracking consumer in-store movements. The RFID-collected data are location sensitive and constantly updated as a consumer moves inside a store. By capturing the entire shopping process including the movement path rather than analyzing merely the shopping basket at check-out, the RFID-collected data provide unique and exciting opportunities to study consumer purchase behavior and thus lead to actionable marketing applications.This dissertation research focuses on (a) advancing the representation and management of the RFID-collected shopping path data; (b) analyzing, modeling and predicting customer shopping activities with a spatial pattern discovery approach and a dynamic probabilistic modeling based methodology to enable advanced spatial business intelligence. The spatial pattern discovery approach identifies similar consumers based on a similarity metric between consumer shopping paths. The direct applications of this approach include a novel consumer segmentation methodology and an in-store real-time product recommendation algorithm. A hierarchical decision-theoretic model based on dynamic Bayesian networks (DBN) is developed to model consumer in-store shopping activities. This model can be used to predict a shopper's purchase goal in real time, infer her shopping actions, and estimate the exact product she is viewing at a time. We develop an approximate inference algorithm based on particle filters and a learning procedure based on the Expectation-Maximization (EM) algorithm to perform filtering and prediction for the network model. The developed models are tested on a real RFID-collected shopping trip dataset with promising results in terms of prediction accuracies of consumer purchase interests.This dissertation contributes to the marketing and information systems literature in several areas. First, it provides empirical insights about the correlation between spatial movement patterns and consumer purchase interests. Such correlation is demonstrated with in-store shopping data, but can be generalized to other marketing contexts such as store visit decisions by consumers and location and category management decisions by a retailer. Second, our study shows the possibility of utilizing consumer in-store movement to predict consumer purchase. The predictive models we developed have the potential to become the base of an intelligent shopping environment where store managers customize marketing efforts to provide location-aware recommendations to consumers as they travel through the store. Consumer In-store Shopping Behavior Dynamic Bayesian Networks Location-aware Marketing Radio Frequency Identification Spatial Data Mining
7	FP-tree Based Spatial Co-location Pattern Mining Yu, Ping 05 1900 (has links) A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude. Data mining. Pattern perception. co-location pattern spatial databases spatial data mining
8	Abnormal Pattern Recognition in Spatial Data Kou, Yufeng 26 January 2007 (has links) In the recent years, abnormal spatial pattern recognition has received a great deal of attention from both industry and academia, and has become an important branch of data mining. Abnormal spatial patterns, or spatial outliers, are those observations whose characteristics are markedly different from their spatial neighbors. The identification of spatial outliers can be used to reveal hidden but valuable knowledge in many applications. For example, it can help locate extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumor cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs, and detect water pollution incidents. Numerous traditional outlier detection methods have been developed, but they cannot be directly applied to spatial data in order to extract abnormal patterns. Traditional outlier detection mainly focuses on "global comparison" and identifies deviations from the remainder of the entire data set. In contrast, spatial outlier detection concentrates on discovering neighborhood instabilities that break the spatial continuity. In recent years, a number of techniques have been proposed for spatial outlier detection. However, they have the following limitations. First, most of them focus primarily on single-attribute outlier detection. Second, they may not accurately locate outliers when multiple outliers exist in a cluster and correlate with each other. Third, the existing algorithms tend to abstract spatial objects as isolated points and do not consider their geometrical and topological properties, which may lead to inexact results. This dissertation reports a study of the problem of abnormal spatial pattern recognition, and proposes a suite of novel algorithms. Contributions include: (1) formal definitions of various spatial outliers, including single-attribute outliers, multi-attribute outliers, and region outliers; (2) a set of algorithms for the accurate detection of single-attribute spatial outliers; (3) a systematic approach to identifying and tracking region outliers in continuous meteorological data sequences; (4) a novel Mahalanobis-distance-based algorithm to detect outliers with multiple attributes; (5) a set of graph-based algorithms to identify point outliers and region outliers; and (6) extensive analysis of experiments on several spatial data sets (e.g., West Nile virus data and NOAA meteorological data) to evaluate the effectiveness and efficiency of the proposed algorithms. / Ph. D. image segmentation similarity search change detection pattern recognition spatial outlier detection spatial data mining
9	Co-Location Decision Tree for Enhancing Decision-Making of Pavement Maintenance and Rehabilitation Zhou, Guoqing 02 March 2011 (has links) A pavement management system (PMS) is a valuable tool and one of the critical elements of the highway transportation infrastructure. Since a vast amount of pavement data is frequently and continuously being collected, updated, and exchanged due to rapidly deteriorating road conditions, increased traffic loads, and shrinking funds, resulting in the rapid accumulation of a large pavement database, knowledge-based expert systems (KBESs) have therefore been developed to solve various transportation problems. This dissertation presents the development of theory and algorithm for a new decision tree induction method, called co-location-based decision tree (CL-DT.) This method will enhance the decision-making abilities of pavement maintenance personnel and their rehabilitation strategies. This idea stems from shortcomings in traditional decision tree induction algorithms, when applied in the pavement treatment strategies. The proposed algorithm utilizes the co-location (co-occurrence) characteristics of spatial attribute data in the pavement database. With the proposed algorithm, one distinct event occurrence can associate with two or multiple attribute values that occur simultaneously in spatial and temporal domains. This research dissertation describes the details of the proposed CL-DT algorithms and steps of realizing the proposed algorithm. First, the dissertation research describes the detailed colocation mining algorithm, including spatial attribute data selection in pavement databases, the determination of candidate co-locations, the determination of table instances of candidate colocations, pruning the non-prevalent co-locations, and induction of co-location rules. In this step, a hybrid constraint, i.e., spatial geometric distance constraint condition and a distinct event-type constraint condition, is developed. The spatial geometric distance constraint condition is a neighborhood relationship-based spatial joins of table instances for many prevalent co-locations with one prevalent co-location; and the distance event-type constraint condition is a Euclidean distance between a set of attributes and its corresponding clusters center of attributes. The dissertation research also developed the spatial feature pruning method using the multi-resolution pruning criterion. The cross-correlation criterion of spatial features is used to remove the nonprevalent co-locations from the candidate prevalent co-location set under a given threshold. The dissertation research focused on the development of the co-location decision tree (CL-DT) algorithm, which includes the non-spatial attribute data selection in the pavement management database, co-location algorithm modeling, node merging criteria, and co-location decision tree induction. In this step, co-location mining rules are used to guide the decision tree generation and induce decision rules. For each step, this dissertation gives detailed flowcharts, such as flowchart of co-location decision tree induction, co-location/co-occurrence decision tree algorithm, algorithm of colocation/co-occurrence decision tree (CL-DT), and outline of steps of SFS (Sequential Feature Selection) algorithm. Finally, this research used a pavement database covering four counties, which are provided by NCDOT (North Carolina Department of Transportation), to verify and test the proposed method. The comparison analyses of different rehabilitation treatments proposed by NCDOT, by the traditional DT induction algorithm and by the proposed new method are conducted. Findings and conclusions include: (1) traditional DT technology can make a consistent decision for road maintenance and rehabilitation strategy under the same road conditions, i.e., less interference from human factors; (2) the traditional DT technology can increase the speed of decision-making because the technology automatically generates a decision-tree and rules if the expert knowledge is given, which saves time and expenses for PMS; (3) integration of the DT and GIS can provide the PMS with the capabilities of graphically displaying treatment decisions, visualizing the attribute and non-attribute data, and linking data and information to the geographical coordinates. However, the traditional DT induction methods are not as quite intelligent as one's expectations. Thus, post-processing and refinement is necessary. Moreover, traditional DT induction methods for pavement M&R strategies only used the non-spatial attribute data. It has been demonstrated from this dissertation research that the spatial data is very useful for the improvement of decision-making processes for pavement treatment strategies. In addition, the decision trees are based on the knowledge acquired from pavement management engineers for strategy selection. Thus, different decision-trees can be built if the requirement changes. / Ph. D. Maintenance and Rehabilitation Decision Tree Spatial Data Mining Co-Location Pavement Management GIS
10	Spatiotemporal Event Forecasting and Analysis with Ubiquitous Urban Sensors Fu, Kaiqun 13 July 2021 (has links) The study of information extraction and knowledge exploration in the urban environment is gaining popularity. Ubiquitous sensors and a plethora of statistical reports provide an immense amount of heterogeneous urban data, such as traffic data, crime activity statistics, social media messages, and street imagery. The development of methods for heterogeneous urban data-based event identification and impacts analysis for a variety of event topics and assumptions is the subject of this dissertation. A graph convolutional neural network for crime prediction, a multitask learning system for traffic incident prediction with spatiotemporal feature learning, social media-based transportation event detection, and a graph convolutional network-based cyberbullying detection algorithm are the four methods proposed. Additionally, based on the sensitivity of these urban sensor data, a comprehensive discussion on ethical issues of urban computing is presented. This work makes the following contributions in urban perception predictions: 1) Create a preference learning system for inferring crime rankings from street view images using a bidirectional convolutional neural network (bCNN). 2) Propose a graph convolutional networkbased solution to the current urban crime perception problem; 3) Develop street view image retrieval algorithms to demonstrate real city perception. This work also makes the following contributions in traffic incident effect analysis: 1) developing a novel machine learning system for predicting traffic incident duration using temporal features; 2) modeling traffic speed similarity among road segments using spatial connectivity in feature space; and 3) proposing a sparse feature learning method for identifying groups of temporal features at a higher level. In transportation-related incidents detection, this work makes the following contributions: 1) creating a real-time social media-based traffic incident detection platform; 2) proposing a query expansion algorithm for traffic-related tweets; and 3) developing a text summarization tool for redundant traffic-related tweets. Cyberbullying detection from social media platforms is one of the major focus of this work: 1) Developing an online Dynamic Query Expansion process using concatenated keyword search. 2) Formulating a graph structure of tweet embeddings and implementing a Graph Convolutional Network for fine-grained cyberbullying classification. 3) Curating a balanced multiclass cyberbullying dataset from DQE, and making it publicly available. Additionally, this work seeks to identify ethical vulnerabilities from three primary research directions of urban computing: urban safety analysis, urban transportation analysis, and social media analysis for urban events. Visions for future improvements in the perspective of ethics are addressed. / Doctor of Philosophy / The ubiquitously deployed urban sensors such as traffic speed meters, street-view cameras, and even smartphones in everybody's pockets are generating terabytes of data every hour. How do we refine the valuable intelligence out of such explosions of urban data and information became one of the profitable questions in the field of data mining and urban computing. In this dissertation, four innovative applications are proposed to solve real-world problems with big data of the urban sensors. In addition, the foreseeable ethical vulnerabilities in the research fields of urban computing and event predictions are addressed. The first work explores the connection between urban perception and crime inferences. StreetNet is proposed to learn crime rankings from street view images. This work presents the design of a street view images retrieval algorithm to improve the representation of urban perception. A data-driven, spatiotemporal algorithm is proposed to find unbiased label mappings between the street view images and the crime ranking records. The second work proposes a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. Such functionality provided by this model is helpful for the transportation operators and first responders to judge the influences of traffic incidents. In the third work, a social media-based traffic status monitoring system is established. The system is initiated by a transportation-related keyword generation process. A state-of-the-art tweets summarization algorithm is designed to eliminate the redundant tweets information. In addition, we show that the proposed tweets query expansion algorithm outperforms the previous methods. The fourth work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim's age, ethnicity, gender, religion, or other quality. This work represents a step forward for establishing an active anti-cyberbullying presence in social media and a step forward towards a future without cyberbullying. Finally, a discussion of the ethical issues in the urban computing community is addressed. This work seeks to identify ethical vulnerabilities from three primary research directions of urban computing: urban safety analysis, urban transportation analysis, and social media analysis for urban events. Visions for future improvements in the perspective of ethics are pointed out. spatial data mining urban computing urban perception event detection Machine learning

Search results