Spelling suggestions: "subject:"spatial outliers""
1 |
Statistical Geocomputing: Spatial Outlier Detection in Precision AgricultureChu Su, Peter 29 September 2011 (has links)
The collection of crop yield data has become much easier with the introduction of technologies such as the Global Positioning System (GPS), ground-based yield sensors, and Geographic Information Systems (GIS). This explosive growth and widespread use of spatial data has challenged the ability to derive useful spatial knowledge. In addition, outlier detection as one important pre-processing step remains a challenge because the technique and the definition of spatial neighbourhood remain non-trivial, and the quantitative assessments of false positives, false negatives, and the concept of region outlier remain unexplored. The overall aim of this study is to evaluate different spatial outlier detection techniques in terms of their accuracy and computational efficiency, and examine the performance of these outlier removal techniques in a site-specific management context.
In a simulation study, unconditional sequential Gaussian simulation is performed to generate crop yield as the response variable along with two explanatory variables. Point and region spatial outliers are added to the simulated datasets by randomly selecting observations and adding or subtracting a Gaussian error term. With simulated data which contains known spatial outliers in advance, the assessment of spatial outlier techniques can be conducted as a binary classification exercise, treating each spatial outlier detection technique as a classifier. Algorithm performance is evaluated with the area and partial area under the ROC curve up to different true positive and false positive rates. Outlier effects in on-farm research are assessed in terms of the influence of each spatial outlier technique on coefficient estimates from a spatial regression model that accounts for autocorrelation.
Results indicate that for point outliers, spatial outlier techniques that account for spatial autocorrelation tend to be better than standard spatial outlier techniques in terms of higher sensitivity, lower false positive detection rate, and consistency in performance. They are also more resistant to changes in the neighbourhood definition. In terms of region outliers, standard techniques tend to be better than spatial autocorrelation techniques in all performance aspects because they are less affected by masking and swamping effects. In particular, one spatial autocorrelation technique, Averaged Difference, is superior to all other techniques in terms of both point and region outlier scenario because of its ability to incorporate spatial autocorrelation while at the same time, revealing the variation between nearest neighbours.
In terms of decision-making, all algorithms led to slightly different coefficient estimates, and therefore, may result in distinct decisions for site-specific management.
The results outlined here will allow an improved removal of crop yield data points that are potentially problematic. What has been determined here is the recommendation of using Averaged Difference algorithm for cleaning spatial outliers in yield dataset. Identifying the optimal nearest neighbour parameter for the neighbourhood aggregation function is still non-trivial. The recommendation is to specify a large number of nearest neighbours, large enough to capture the region size. Lastly, the unbiased coefficient estimates obtained with Average Difference suggest it is the better method for pre-processing spatial outliers in crop yield data, which underlines its suitability for detecting spatial outlier in the context of on-farm research.
|
2 |
Statistical Geocomputing: Spatial Outlier Detection in Precision AgricultureChu Su, Peter 29 September 2011 (has links)
The collection of crop yield data has become much easier with the introduction of technologies such as the Global Positioning System (GPS), ground-based yield sensors, and Geographic Information Systems (GIS). This explosive growth and widespread use of spatial data has challenged the ability to derive useful spatial knowledge. In addition, outlier detection as one important pre-processing step remains a challenge because the technique and the definition of spatial neighbourhood remain non-trivial, and the quantitative assessments of false positives, false negatives, and the concept of region outlier remain unexplored. The overall aim of this study is to evaluate different spatial outlier detection techniques in terms of their accuracy and computational efficiency, and examine the performance of these outlier removal techniques in a site-specific management context.
In a simulation study, unconditional sequential Gaussian simulation is performed to generate crop yield as the response variable along with two explanatory variables. Point and region spatial outliers are added to the simulated datasets by randomly selecting observations and adding or subtracting a Gaussian error term. With simulated data which contains known spatial outliers in advance, the assessment of spatial outlier techniques can be conducted as a binary classification exercise, treating each spatial outlier detection technique as a classifier. Algorithm performance is evaluated with the area and partial area under the ROC curve up to different true positive and false positive rates. Outlier effects in on-farm research are assessed in terms of the influence of each spatial outlier technique on coefficient estimates from a spatial regression model that accounts for autocorrelation.
Results indicate that for point outliers, spatial outlier techniques that account for spatial autocorrelation tend to be better than standard spatial outlier techniques in terms of higher sensitivity, lower false positive detection rate, and consistency in performance. They are also more resistant to changes in the neighbourhood definition. In terms of region outliers, standard techniques tend to be better than spatial autocorrelation techniques in all performance aspects because they are less affected by masking and swamping effects. In particular, one spatial autocorrelation technique, Averaged Difference, is superior to all other techniques in terms of both point and region outlier scenario because of its ability to incorporate spatial autocorrelation while at the same time, revealing the variation between nearest neighbours.
In terms of decision-making, all algorithms led to slightly different coefficient estimates, and therefore, may result in distinct decisions for site-specific management.
The results outlined here will allow an improved removal of crop yield data points that are potentially problematic. What has been determined here is the recommendation of using Averaged Difference algorithm for cleaning spatial outliers in yield dataset. Identifying the optimal nearest neighbour parameter for the neighbourhood aggregation function is still non-trivial. The recommendation is to specify a large number of nearest neighbours, large enough to capture the region size. Lastly, the unbiased coefficient estimates obtained with Average Difference suggest it is the better method for pre-processing spatial outliers in crop yield data, which underlines its suitability for detecting spatial outlier in the context of on-farm research.
|
3 |
Abnormal Pattern Recognition in Spatial DataKou, Yufeng 26 January 2007 (has links)
In the recent years, abnormal spatial pattern recognition has received a great deal of attention from both industry and academia, and has become an important branch of data mining. Abnormal spatial patterns, or spatial outliers, are those observations whose characteristics are markedly different from their spatial neighbors. The identification of spatial outliers can be used to reveal hidden but valuable knowledge in many applications. For example, it can help locate extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumor cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs, and detect water pollution incidents.
Numerous traditional outlier detection methods have been developed, but they cannot be directly applied to spatial data in order to extract abnormal patterns. Traditional outlier detection mainly focuses on "global comparison" and identifies deviations from the remainder of the entire data set. In contrast, spatial outlier detection concentrates on discovering neighborhood instabilities that break the spatial continuity. In recent years, a number of techniques have been proposed for spatial outlier detection. However, they have the following limitations. First, most of them focus primarily on single-attribute outlier detection. Second, they may not accurately locate outliers when multiple outliers exist in a cluster and correlate with each other. Third, the existing algorithms tend to abstract spatial objects as isolated points and do not consider their geometrical and topological properties, which may lead to inexact results.
This dissertation reports a study of the problem of abnormal spatial pattern recognition, and proposes a suite of novel algorithms. Contributions include: (1) formal definitions of various spatial outliers, including single-attribute outliers, multi-attribute outliers, and region outliers; (2) a set of algorithms for the accurate detection of single-attribute spatial outliers; (3) a systematic approach to identifying and tracking region outliers in continuous meteorological data sequences; (4) a novel Mahalanobis-distance-based algorithm to detect outliers with multiple attributes; (5) a set of graph-based algorithms to identify point outliers and region outliers; and (6) extensive analysis of experiments on several spatial data sets (e.g., West Nile virus data and NOAA meteorological data) to evaluate the effectiveness and efficiency of the proposed algorithms. / Ph. D.
|
Page generated in 0.0702 seconds