Global ETD Search

1	Sihui_Wang_thesis.pdf Sihui Wang (17522025) 01 December 2023 (has links) <p dir="ltr">Traditionally, spatial data pertains to observations made at various spatial locations, with interpolation commonly being the central aim of such analyses. However, the relevance of this data has expanded notably to scenarios where the spatial location represents input variables, and the observed response variable embodies the model outcome, a concept applicable in arenas like computer experiments and recommender systems. Spatial prediction, pervasive across many disciplines, often employs linear prediction due to its simplicity. Kriging, originally developed in mining, engineering has found utility in diverse fields such as environmental sciences, hydrology, natural resources, remote sensing, and computer experiments, among others. In these applications, Gaussian processes have emerged as a powerful tool. Essential for kernel learning methods in machine learning, Kriging necessitates the inversion of the covariance matrix related to the observed random variables.</p><p dir="ltr">A primary challenge in spatial data analysis, in this expansive sense, is handling the large covariance matrix involved in the best linear prediction, or Kriging, and the Gaussian likelihood function. Recent studies have revealed that the covariance matrix can become ill-conditioned with increasing dimensions. This revelation underscores the need to seek alternative methodologies for analyzing extensive spatial data that avoid relying on the full covariance matrix. Although various strategies, such as covariance tapering, use of block diagonal matrices, and traditional low-rank model with perturbation, have been proposed to combat the computational hurdles linked with large spatial data, not all effectively resolve the predicament of an ill-conditioned covariance matrix.</p><p dir="ltr">In this thesis, we examine two promising strategies for the analysis of large-scale spatial data. The first is the low-rank approximation, a tactic that exists in multiple forms. Traditional low-rank models employ perturbation to handle the ill-conditioned covariance matrix but fall short in data prediction accuracy. We propose the use of a pseudo-inverse for the low-rank model as an alternative to full Kriging in handling massive spatial data. We will demonstrate that the prediction variance of the proposed low-rank model can be comparable to that of full Kriging, while offering computational cost benefits. Furthermore, our proposed low-rank model surpasses the traditional low-rank model in data interpolation. Consequently, when full Kriging is untenable due to an ill-conditioned covariance matrix, our proposed low-rank model becomes a viable alternative for interpolating large spatial data sets with high precision.</p><p dir="ltr">The second strategy involves harnessing deep learning for spatial interpolation. We explore machine learning approaches adept at modeling voluminous spatial data. Contrary to the majority of existing research that applies deep learning exclusively to model the mean function in spatial data, we concentrate on encapsulating spatial correlation. This approach harbors potential for effectively modeling non-stationary spatial phenomena. Given that Kriging is predicated on the data being influenced by an unknown constant mean, serving as the best linear unbiased predictor under this presupposition, we foresee its superior performance in stationary cases. Conversely, DeepKriging, with its intricate structure for both the mean function and spatial basis functions, exhibits enhanced performance in the realm of nonstationary data.</p> Spatial statistics Spatial Statistics method
2	Point process modelling in environmental epidemiology Morris, Sara January 1995 (has links) No description available. 519.5 Spatial statistics; Clusters
3	Applying an Intrinsic Conditional Autoregressive Reference Prior for Areal Data Porter, Erica May 09 July 2019 (has links) Bayesian hierarchical models are useful for modeling spatial data because they have flexibility to accommodate complicated dependencies that are common to spatial data. In particular, intrinsic conditional autoregressive (ICAR) models are commonly assigned as priors for spatial random effects in hierarchical models for areal data corresponding to spatial partitions of a region. However, selection of prior distributions for these spatial parameters presents a challenge to researchers. We present and describe ref.ICAR, an R package that implements an objective Bayes intrinsic conditional autoregressive prior on a vector of spatial random effects. This model provides an objective Bayesian approach for modeling spatially correlated areal data. ref.ICAR enables analysis of spatial areal data for a specified region, given user-provided data and information about the structure of the study region. The ref.ICAR package performs Markov Chain Monte Carlo (MCMC) sampling and outputs posterior medians, intervals, and trace plots for fixed effect and spatial parameters. Finally, the functions provide regional summaries, including medians and credible intervals for fitted values by subregion. / Master of Science / Spatial data is increasingly relevant in a wide variety of research areas. Economists, medical researchers, ecologists, and policymakers all make critical decisions about populations using data that naturally display spatial dependence. One such data type is areal data; data collected at county, habitat, or tract levels are often spatially related. Most convenient software platforms provide analyses for independent data, as the introduction of spatial dependence increases the complexity of corresponding models and computation. Use of analyses with an independent data assumption can lead researchers and policymakers to make incorrect, simplistic decisions. Bayesian hierarchical models can be used to effectively model areal data because they have flexibility to accommodate complicated dependencies that are common to spatial data. However, use of hierarchical models increases the number of model parameters and requires specification of prior distributions. We present and describe ref.ICAR, an R package available to researchers that automatically implements an objective Bayesian analysis that is appropriate for areal data. Bayesian Analysis Spatial Statistics
4	Geological Effects on Lightning Strike Distributions Berdahl, J. Scott 16 May 2016 (has links) Recent advances in lightning detection networks allow for detailed mapping of lightning flash locations. Longstanding rumors of geological influence on cloud-to-ground (CG) lightning distribution and recent commercial claims based on such influence can now be tested empirically. If present, such influence could represent a new, cheap and efficient geophysical tool with applications in mineral, hydrothermal and oil exploration, regional geological mapping, and infrastructure planning. This project applies statistical analysis to lightning data collected by the United States National Lightning Detection Network from 2006 through 2015 in order to assess whether the huge range in electrical conductivities of geological materials plays a role in the spatial distribution of CG lightning. CG flash densities are mapped for twelve areas in the contiguous United States and compared to elevation and geology, as well as to the locations of faults, railroads and tall towers including wind turbines. Overall spatial randomness is assessed, along with spatial correlation of attributes. Negative and positive polarity lightning are considered separately and together. Topography and tower locations show a strong influence on CG distribution patterns. Geology, faults and railroads do not. This suggests that ground conductivity is not an important factor in determining lightning strike location on scales larger than current flash location accuracies, which are generally several hundred meters. Once a lightning channel is established, however, ground properties at the contact point may play a role in determining properties of the subsequent stroke. Lightning Geology Spatial statistics Conductivity
5	A generic similarity test for spatial data Kirsten, René January 2020 (has links) Two spatial data sets are considered to be similar if they originate from the same stochastic process in terms of their spatial structure. Many tests have been developed over recent years to test the similarity of certain types of spatial data, such as spatial point patterns, geostatistical data and images. This research develops a similarity test able to handle various types of spatial data, for example images (modelled spatially), point patterns, marked point patterns, geostatistical data and lattice patterns. The test consists of three steps. The first step creates a pixel image representation of each spatial data set considered. In the second step a local similarity map is created from the two pixel image representations from step one. The local similarity map is obtained by either using the well-known similarity measure for images called the Structural SIMilarity Index (SSIM) when having continuous pixel values or a direct comparison in the case of discrete pixel values. The calculation of the final similarity measure is done in the third step of the test. This calculation is based on the S-index of Andresen's spatial point pattern test. The S-index is calculated as the proportion of similar spatial units in the domain where s_i is used as a binary indicator of similarity. In the case of discrete pixel values, s_i are still used as a binary input whereas in the case of continuous pixel values the resulting SSIM values are used as a non-binary s_i input. The proposed spatial similarity test is tested with a simulation study where the simulations are designed to have comparisons that are either 80% or 90% identical. With the simulation study it is concluded that the test is not sensitive to the resolution of the pixel image. The application is done on property valuations in Johannesburg and Cape Town. The test is applied to the similarity of property prices in the same area over different years as well as testing the similarity of property prices between the different areas of properties. / Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2020. / The financial assistance of the National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF. / Statistics / MSc (Advanced Data Analytics) / Unrestricted UCTD Mathematical Statistics Spatial Statistics
6	Spatial dependency between a linear network and a point pattern Kunene, Thembinkosi January 2020 (has links) In this mini-dissertation we discuss the spatial relationship between point processes and a linear network. As a starting point, we discuss basic spatial point processes and tests for first-order homogeneity. Following that, we discuss second-order properties of point processes in the form of Ripley's K-function for unmarked point patterns and the cross-K function for marked point patterns. We then get to the main focus of this mini-dissertation, that is, the spatial relationship between points and linear structures, particularly linear networks. Recently developed is a method to characterise the spatial relationship between points and linear networks by Comas et al. [13], similar to Ripley's K-function for point-to-point relationships. The non-stationarity of a linear network is of particular interest in how it affects the measurement of this spatial relationship, which has not been explicitly investigated in the literature before. To investigate this we consider the Poisson line process and how one might simulate a non-stationary line process. Furthermore, we discuss a mechanism to extend tests of first-order homogeneity of point patterns to line patterns. The non-stationary line process is used to model linear networks in the simulations conducted to determine the effect of this non-stationarity on the developed method, which was not covered in the original article [13]. The methodology is developed and tested on a real data set. / Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2020. / ESRI South Africa / Statistics / MSc (Advanced Data Analytics) / Unrestricted UCTD Mathematical Statistics Spatial Statistics
7	Bayesian Model Selection for Spatial Data and Cost-constrained Applications Porter, Erica May 03 July 2023 (has links) Bayesian model selection is a useful tool for identifying an appropriate model class, dependence structure, and valuable predictors for a wide variety of applications. In this work we consider objective Bayesian model selection where no subjective information is available to inform priors on model parameters a priori, specifically in the case of hierarchical models for spatial data, which can have complex dependence structures. We develop an approach using trained priors via fractional Bayes factors where standard Bayesian model selection methods fail to produce valid probabilities under improper reference priors. This enables researchers to concurrently determine whether spatial dependence between observations is apparent and identify important predictors for modeling the response. In addition to model selection with objective priors on model parameters, we also consider the case where the priors on the model space are used to penalize individual predictors a priori based on their costs. We propose a flexible approach that introduces a tuning parameter to cost-penalizing model priors that allows researchers to control the level of cost penalization to meet budget constraints and accommodate increasing sample sizes. / Doctor of Philosophy / Spatial data, such as data collected over a geographic region, is relevant in many fields. Spatial data can require complex models to study, but use of these models can impose unnecessary computations and increased difficulty for interpretation when spatial dependence is weak or not present. We develop a method to simultaneously determine whether a spatial model is necessary to understand the data and choose important variables associated with the outcome of interest. Within a class of simpler, linear models, we propose a technique to identify important variables associated with an outcome when there exists a budget or general desire to minimize the cost of collecting the variables. Spatial statistics Bayesian model selection
8	Bayesian Logistic Regression with Spatial Correlation: An Application to Tennessee River Pollution Marjerison, William M 15 December 2006 (has links) "We analyze data (length, weight and location) from a study done by the Army Corps of Engineers along the Tennessee River basin in the summer of 1980. The purpose is to predict the probability that a hypothetical channel catfish at a location studied is toxic and contains 5 ppm or more DDT in its filet. We incorporate spatial information and treate it separetely from other covariates. Ultimately, we want to predict the probability that a catfish from the unobserved location is toxic. In a preliminary analysis, we examine the data for observed locations using frequentist logistic regression, Bayesian logistic regression, and Bayesian logistic regression with random effects. Later we develop a parsimonious extension of Bayesian logistic regression and the corresponding Gibbs sampler for that model to increase computational feasibility and reduce model parameters. Furthermore, we develop a Bayesian model to impute data for locations where catfish were not observed. A comparison is made between results obtained fitting the model to only observed data and data with missing values imputed. Lastly, a complete model is presented which imputes data for missing locations and calculates the probability that a catfish from the unobserved location is toxic at once. We conclude that length and weight of the fish have negligible effect on toxicity. Toxicity of these catfish are mostly explained by location and spatial effects. In particular, the probability that a catfish is toxic decreases as one moves further downstream from the source of pollution." logistic regression Bayesian statistics MCMC spatial statistics
9	A Novel Count Weighted Wilcoxon Rank-Sum Test and Application to Medical Data Cong, Xinyu January 2022 (has links) No description available. Biostatistics non-parametric spatial statistics spreading depolarization count
10	A Spatial Statistical Analysis to Estimate the Spatial Dynamics of the 2009 H1N1 Pandemic in the Greater Toronto Area Fan, WENYONG 05 November 2012 (has links) The 2009 H1N1 pandemic caused serious concerns worldwide due to the novel biological feature of the virus strain, and the high morbidity rate for youth. The urban scale is crucial for analyzing the pandemic in metropolitan areas such as the Greater Toronto Area (GTA) of Canada because of its large population. The challenge of exploring the spatial dynamics of H1N1 is exaggerated by data scarcity and the absence of an immediately applicable methodology at such a scale. In this study, a stepwise methodology is developed, and a retrospective spatial statistical analysis is conducted using the methodology to estimate the spatial dynamics of the 2009 H1N1 pandemic in the GTA when the data scarcity exists. The global and local spatial autocorrelation analyses are carried out through the use of multiple spatial analysis tools to confirm the existence and significance of spatial clustering effects. A Generalized Linear Mixed Model (GLMM) implemented in Statistical Analysis System (SAS) is used to estimate the area-specific spatial dynamics. The GLMM is configured to a spatial model that incorporates an Intrinsic Gaussian Conditionally Autoregressive (ICAR) model, and a non-spatial model respectively. Comparing the results of spatial and non-spatial configurations of the GLMM suggests that the spatial GLMM, which incorporates the ICAR model, proves a better predictability. This indicates that the methodology developed in this study can be applied to epidemiology studies to analyze the spatial dynamics in similar scenarios. / Thesis (Master, Geography) -- Queen's University, 2012-10-30 17:41:28.445

Search results