Global ETD Search

11	Multiple imputation in the presence of a detection limit, with applications : an empirical approach / Shawn Carl Liebenberg Liebenberg, Shawn Carl January 2014 (has links) Scientists often encounter unobserved or missing measurements that are typically reported as less than a fixed detection limit. This especially occurs in the environmental sciences when detection of low exposures are not possible due to limitations of the measuring instrument, and the resulting data are often referred to as type I and II left censored data. Observations lying below this detection limit are therefore often ignored, or `guessed' because it cannot be measured accurately. However, reliable estimates of the population parameters are nevertheless required to perform statistical analysis. The problem of dealing with values below a detection limit becomes increasingly complex when a large number of observations are present below this limit. Researchers thus have interest in developing statistical robust estimation procedures for dealing with left- or right-censored data sets (SinghandNocerino2002). The aim of this study focuses on several main components regarding the problems mentioned above. The imputation of censored data below a fixed detection limit are studied, particularly using the maximum likelihood procedure of Cohen(1959), and several variants thereof, in combination with four new variations of the multiple imputation concept found in literature. Furthermore, the focus also falls strongly on estimating the density of the resulting imputed, `complete' data set by applying various kernel density estimators. It should be noted that bandwidth selection issues are not of importance in this study, and will be left for further research. In this study, however, the maximum likelihood estimation method of Cohen (1959) will be compared with several variant methods, to establish which of these maximum likelihood estimation procedures for censored data estimates the population parameters of three chosen Lognormal distribution, the most reliably in terms of well-known discrepancy measures. These methods will be implemented in combination with four new multiple imputation procedures, respectively, to assess which of these nonparametric methods are most effective with imputing the 12 censored values below the detection limit, with regards to the global discrepancy measures mentioned above. Several variations of the Parzen-Rosenblatt kernel density estimate will be fitted to the complete filled-in data sets, obtained from the previous methods, to establish which is the preferred data-driven method to estimate these densities. The primary focus of the current study will therefore be the performance of the four chosen multiple imputation methods, as well as the recommendation of methods and procedural combinations to deal with data in the presence of a detection limit. An extensive Monte Carlo simulation study was performed to compare the various methods and procedural combinations. Conclusions and recommendations regarding the best of these methods and combinations are made based on the study's results. / MSc (Statistics), North-West University, Potchefstroom Campus, 2014 Multiple imputation Detection limit Maximum likelihood estimation Kernel density estimation Bootstrap
12	Multiple imputation in the presence of a detection limit, with applications : an empirical approach / Shawn Carl Liebenberg Liebenberg, Shawn Carl January 2014 (has links) Scientists often encounter unobserved or missing measurements that are typically reported as less than a fixed detection limit. This especially occurs in the environmental sciences when detection of low exposures are not possible due to limitations of the measuring instrument, and the resulting data are often referred to as type I and II left censored data. Observations lying below this detection limit are therefore often ignored, or `guessed' because it cannot be measured accurately. However, reliable estimates of the population parameters are nevertheless required to perform statistical analysis. The problem of dealing with values below a detection limit becomes increasingly complex when a large number of observations are present below this limit. Researchers thus have interest in developing statistical robust estimation procedures for dealing with left- or right-censored data sets (SinghandNocerino2002). The aim of this study focuses on several main components regarding the problems mentioned above. The imputation of censored data below a fixed detection limit are studied, particularly using the maximum likelihood procedure of Cohen(1959), and several variants thereof, in combination with four new variations of the multiple imputation concept found in literature. Furthermore, the focus also falls strongly on estimating the density of the resulting imputed, `complete' data set by applying various kernel density estimators. It should be noted that bandwidth selection issues are not of importance in this study, and will be left for further research. In this study, however, the maximum likelihood estimation method of Cohen (1959) will be compared with several variant methods, to establish which of these maximum likelihood estimation procedures for censored data estimates the population parameters of three chosen Lognormal distribution, the most reliably in terms of well-known discrepancy measures. These methods will be implemented in combination with four new multiple imputation procedures, respectively, to assess which of these nonparametric methods are most effective with imputing the 12 censored values below the detection limit, with regards to the global discrepancy measures mentioned above. Several variations of the Parzen-Rosenblatt kernel density estimate will be fitted to the complete filled-in data sets, obtained from the previous methods, to establish which is the preferred data-driven method to estimate these densities. The primary focus of the current study will therefore be the performance of the four chosen multiple imputation methods, as well as the recommendation of methods and procedural combinations to deal with data in the presence of a detection limit. An extensive Monte Carlo simulation study was performed to compare the various methods and procedural combinations. Conclusions and recommendations regarding the best of these methods and combinations are made based on the study's results. / MSc (Statistics), North-West University, Potchefstroom Campus, 2014 Multiple imputation Detection limit Maximum likelihood estimation Kernel density estimation Bootstrap
13	Estimation of Kinetic Parameters From List-Mode Data Using an Indirect Approach Ortiz, Joseph Christian, Ortiz, Joseph Christian January 2016 (has links) This dissertation explores the possibility of using an imaging approach to model classical pharmacokinetic (PK) problems. The kinetic parameters which describe the uptake rates of a drug within a biological system, are parameters of interest. Knowledge of the drug uptake in a system is useful in expediting the drug development process, as well as providing a dosage regimen for patients. Traditionally, the uptake rate of a drug in a system is obtained via sampling the concentration of the drug in a central compartment, usually the blood, and fitting the data to a curve. In a system consisting of multiple compartments, the number of kinetic parameters is proportional to the number of compartments, and in classical PK experiments, the number of identifiable parameters is less than the total number of parameters. Using an imaging approach to model classical PK problems, the support region of each compartment within the system will be exactly known, and all the kinetic parameters are uniquely identifiable. To solve for the kinetic parameters, an indirect approach, which is a two part process, was used. First the compartmental activity was obtained from data, and next the kinetic parameters were estimated. The novel aspect of the research is using listmode data to obtain the activity curves from a system as opposed to a traditional binned approach. Using techniques from information theoretic learning, particularly kernel density estimation, a non-parametric probability density function for the voltage outputs on each photo-multiplier tube, for each event, was generated on the fly, which was used in a least squares optimization routine to estimate the compartmental activity. The estimability of the activity curves for varying noise levels as well as time sample densities were explored. Once an estimate for the activity was obtained, the kinetic parameters were obtained using multiple cost functions, and the compared to each other using the mean squared error as the figure of merit. Kernel Density Estimation Least Squares List-Mode Optimization Pharmacokinetics Optical Sciences Imaging
14	FITTING A DISTRIBUTION TO CATASTROPHIC EVENT Osei, Ebenezer 15 December 2010 (has links) Statistics is a branch of mathematics which is heavily employed in the area of Actuarial Mathematics. This thesis first reviews the importance of statistical distributions in the analysis of insurance problems and the applications of Statistics in the area of risk and insurance. The Normal, Log-normal, Pareto, Gamma, standard Beta, Frechet, Gumbel, Weibull, Poisson, binomial, and negative binomial distributions are looked at and the importance of these distributions in general insurance is also emphasized. A careful review of literature is to provide practitioners in the general insurance industry with statistical tools which are of immediate application in the industry. These tools include estimation methods and fit statistics popular in the insurance industry. Finally this thesis carries out the task of fitting statistical distributions to the flood loss data in the 50 States of the United States. ACTUARIALLY FAIR PREMIUM EXTREME VALUE DISTRIBUTIONS Physical Sciences and Mathematics
15	The k-Sample Problem When k is Large and n Small Zhan, Dongling 2012 May 1900 (has links) The k-sample problem, i.e., testing whether two or more data sets come from the same population, is a classic one in statistics. Instead of having a small number of k groups of samples, this dissertation works on a large number of p groups of samples, where within each group, the sample size, n, is a fixed, small number. We call this as a "Large p, but Small n" setting. The primary goal of the research is to provide a test statistic based on kernel density estimation (KDE) that has an asymptotic normal distribution when p goes to infinity with n fixed. In this dissertation, we propose a test statistic called Tp(S) and its standardized version, T(S). By using T(S), we conduct our test based on the critical values of the standard normal distribution. Theoretically, we show that our test is invariant to a location and scale transformation of the data. We also find conditions under which our test is consistent. Simulation studies show that our test has good power against a variety of alternatives. The real data analyses show that our test finds differences between gene distributions that are not due simply to location. K-Sample Problem Kernel Density Estimation Asymptotic Normal Distribution Hypothesis Test Random Effects
16	Choosing a Kernel for Cross-Validation Savchuk, Olga 14 January 2010 (has links) The statistical properties of cross-validation bandwidths can be improved by choosing an appropriate kernel, which is different from the kernels traditionally used for cross- validation purposes. In the light of this idea, we developed two new methods of bandwidth selection termed: Indirect cross-validation and Robust one-sided cross- validation. The kernels used in the Indirect cross-validation method yield an improvement in the relative bandwidth rate to n^1=4, which is substantially better than the n^1=10 rate of the least squares cross-validation method. The robust kernels used in the Robust one-sided cross-validation method eliminate the bandwidth bias for the case of regression functions with discontinuous derivatives. bandwidth selection cross-validation kernel density estimation kernel regression nonparametric function estimation
17	Geologic Factors Affecting Hydrocarbon Occurrence in Paleovalleys of the Mississippian-Pennsylvanian Unconformity in the Illinois Basin London, Jeremy Taylor 01 May 2014 (has links) Paleovalleys associated with the Mississippian-Pennsylvanian unconformity have been identified as potential targets for hydrocarbon exploration in the Illinois Basin. Though there is little literature addressing the geologic factors controlling hydrocarbon accumulation in sub-Pennsylvanian paleovalleys basin-wide, much work has been done to identify the Mississippian-Pennsylvanian unconformity, characterize the Chesterian and basal Pennsylvanian lithology, map the sub-Pennsylvanian paleogeology and delineate the pre-Pennsylvanian paleovalleys in the Illinois Basin. This study uses Geographic Information Systems (GIS) to determine the geologic factors controlling the distribution of hydrocarbon-bearing sub-Pennsylvanian paleovalley fill in the Illinois Basin. A methodology was developed to identify densely-drilled areas without associated petroleum occurrence in basal Pennsylvanian paleovalley fill. Kernel density estimation was used to approximate drilling activity throughout the basin and identify “hotspots” of high well density. Pennsylvanian oil and gas fields were compared to the hotspots to identify which areas were most likely unrelated to Pennsylvanian production. Those hotspots were then compared to areas with known hydrocarbon accumulations in sub-Pennsylvanian paleovalleys to determine what varies geologically amongst these locations. Geologic differences provided insight regarding the spatial distribution of hydrocarbon-bearing sub-Pennsylvanian paleovalleys in the Illinois Basin. It was found that the distribution of hydrocarbon-bearing paleovalleys in the Illinois Basin follows structural features and faults. In the structurally dominated portions of the Illinois Basin, especially in eastern Illinois along the La Salle Anticlinal Belt, hydrocarbons migrate into paleovalleys from underlying hydrocarbon-rich sub- Pennsylvanian paleogeology. Along the fault-dominated areas, such as the Wabash, Rough Creek and Pennyrile Fault Zones, migration occurs upwards along faults from deeper sources. Cross sections were made to gain a better understanding of the paleovalley reservoir and to assess the utility of using all the data collected in this study to locate paleovalley reservoirs. The Main Consolidated Field in Crawford County, Illinois, was chosen as the best site for subsurface mapping due to its high well density, associated Pennsylvanian production, and locally incised productive Chesterian strata. Four cross sections revealed a complex paleovalley reservoir with many potential pay zones. The methodology used to locate this paleovalley reservoir can be applied to other potential sites within the Illinois Basin and to other basins as well. GIS Paleogeology Kernel Density Estimation Geology-Stratigraphic Petroleum Geochemistry Geographic Information Sciences Geology
18	On the Modifiable Areal Unit Problem and kernel home range analyses: the case of woodland caribou (Rangifer tarandus caribou) Kilistoff, Kristen 10 September 2014 (has links) There are a myriad of studies of animal habitat use that employ the notion of “home range”. Aggregated information on animal locations provide insight into a geographically discrete units that represents the use of space by an animal. Among various methods to delineate home range is the commonly used Kernel Density Estimation (KDE). The KDE method delineates home ranges based on an animal’s Utilization Distribution (UD). Specifically, a UD estimates a three-dimensional surface representing the probability or intensity of habitat use by an animal based on known locations. The choice of bandwidth (i.e., kernel radius) in KDE determines the level of smoothing and thus, ultimately circumscribes the size and shape of an animal’s home range. The bounds of interest in a home range can then be delineated using different volume contours of the UD (e.g., 95% or 50%). Habitat variables can then be assessed within the chosen UD contour(s) to ascertain selection for certain habitat characteristics. Home range analyses that utilize the KDE method, and indeed all methods of home range delineation, are subject to the Modifiable Areal Unit Problem (MAUP) whereby the changes in the scale at which data (e.g., habitat variables) are analysed can alter the outcome of statistical analyses and resulting ecological inferences. There are two components to MAUP, the scale and zoning effects. The scale effect refers to changes to the data and, consequently the outcome of analyses as a result of aggregating data to coarser spatial units of analysis. The aggregation of data can result in a loss of fine-scale detail as well as change the observed spatial patterns. The zone effect refers to how, when holding scale constant, the delineation of areal units in space can alter data values and ultimately the results of analyses. For example, habitat features captured within 1km2 gridded sampling units may change if instead 1km2 hexagon units are used. This thesis holds there are three “modifiable” factors in home range analyses that render it subject to the MAUP. The first two relate specifically to the use of the KDE method namely, the choice of bandwidth and UD contour. The third is the grain (e.g., resolution) by which habitat variables are aggregated, which applies to KDE but also more broadly to other quantitative methods of home range delineation In the following chapters we examine the changes in values of elevation and slope that result from changes to KDE bandwidth (Chapter 2) UD contour (Chapter 3) and DEM resolution (Chapter 4). In each chapter we also examine how the observed effects of altering each individual parameter of scale (e.g., bandwidth) changes when different scales of the other two parameters are considered (e.g., contour and resolution). We expected that the scale of each parameter examined would change the observed effect of other parameters. For example, that the homogenization of data at coarser resolutions would reduce the degree of difference in variable values between UD contours of each home range. To explore the potential effects of MAUP on home range analyses we used as model population 13 northern woodland caribou (Rangifer tarandus). We created seasonal home ranges (winter, calving, summer, rut and fall) for each caribou using three different KDE bandwidths. Within each home range we delineated four contours based on differing levels of an animal’s UD. We then calculated values of elevation and slope (mean, standard deviation and coefficient of variation) using a Digital Elevation Model (DEM) aggregated to four different resolutions within the contours of each seasonal home range. We found that each parameter of scale significantly changed the values of elevation and slope within the home ranges of the model caribou population. The magnitude as well as direction of change in slope and elevation often varied depending the specific contour or season. There was a greater decrease in the variability of elevation within the fall and winter seasons at smaller KDE bandwidths. The topographic variables were significantly different between all contours of caribou home ranges and the difference between contours were in general, significantly higher in fall and winter (elevation) or calving and summer (slope). The mean and SD of slope decreased at coarser resolutions in all caribou home ranges, whereas there was no change in elevation. We also found interactive effects of all three parameters of scale, although these were not always as direct as initially anticipated. Each parameter examined (bandwidth, contour and resolution) may potentially alter the outcome of northern woodland caribou habitat analyses. We conclude that home range analyses that utilize the KDE method may be subject to MAUP by virtue the ability to modify the spatial dimensions of the units of analysis. As such, in habitat analyses using the KDE careful consideration should be given to the choice of bandwidth, UD contour and habitat variable resolution. / Graduate / 0366 / 0329 / spicym@uvic.ca Geography Modifiable Areal Unit Problem Home range analyses Woodland caribou Kernel Density Estimation
19	Statistical gas distribution modelling for mobile robot applications Reggente, Matteo January 2014 (has links) In this dissertation, we present and evaluate algorithms for statistical gas distribution modelling in mobile robot applications. We derive a representation of the gas distribution in natural environments using gas measurements collected with mobile robots. The algorithms fuse different sensors readings (gas, wind and location) to create 2D or 3D maps. Throughout this thesis, the Kernel DM+V algorithm plays a central role in modelling the gas distribution. The key idea is the spatial extrapolation of the gas measurement using a Gaussian kernel. The algorithm produces four maps: the weight map shows the density of the measurements; the confidence map shows areas in which the model is considered being trustful; the mean map represents the modelled gas distribution; the variance map represents the spatial structure of the variance of the mean estimate. The Kernel DM+V/W algorithm incorporates wind measurements in the computation of the models by modifying the shape of the Gaussian kernel according to the local wind direction and magnitude. The Kernel 3D-DM+V/W algorithm extends the previous algorithm to the third dimension using a tri-variate Gaussian kernel. Ground-truth evaluation is a critical issue for gas distribution modelling with mobile platforms. We propose two methods to evaluate gas distribution models. Firstly, we create a ground-truth gas distribution using a simulation environment, and we compare the models with this ground-truth gas distribution. Secondly, considering that a good model should explain the measurements and accurately predicts new ones, we evaluate the models according to their ability in inferring unseen gas concentrations. We evaluate the algorithms carrying out experiments in different environments. We start with a simulated environment and we end in urban applications, in which we integrated gas sensors on robots designed for urban hygiene. We found that typically the models that comprise wind information outperform the models that do not include the wind data.
20	ILLINOIS STATEWIDE HEALTHCARE AND EDUCATION MAPPING KC, Binita 01 December 2010 (has links) Illinois statewide infrastructure mapping provides basis for economic development of the state. As a part of infrastructure mapping, this study is focused on mapping healthcare and education services for Illinois. Over 4337 k-12 schools and 1331 hospitals and long term cares were used in analyzing healthcare and education services. Education service was measured as ratio of population to teacher and healthcare service as the ratio of population to bed. Both of these services were mapped using three mapping techniques including Choropleth mapping, Thiessen polygon, and Kernel Density Estimation. The mapping was also conducted at three scales including county, census tract, and ZIP code area. The obtained maps were compared by visual interpretation and statistical correlation analysis. Moreover, spatial pattern analysis of maps was conducted using global and local Moran's I, high/low clustering, and hotspot analysis methods. In addition, multivariate mapping was carried out to demonstrate the spatial distributions of multiple variables and their relationships. The results showed that both Choropleth mapping and Thiessen polygon methods resulted in the service levels that were homogeneous throughout the polygons and abruptly changed at the boundaries hence which ignored the cross boundary flow of people for healthcare and education services. In addition they do not reflect the distance decay of services. Kernel Density mapping quantified the continuous and variable healthcare and educational services and has the potential to provide more accurate estimates of healthcare and educational services. Moreover, the county scale maps are more reliable than the census tract and ZIP code area maps. In addition, multivariate map obtained by legend design that combined the values of multiple variables well demonstrated the spatial distributions of healthcare and education services along with per capita income and relationships between them. Overall, Morgan, Wayne, Mason, and Ford counties had higher services for both education and healthcare whereas Champaign, Johnson, and Perry had lower service levels of healthcare and education. Generally, cities and the areas close to cities have better healthcare and educational service than other areas because of higher per capita income. In addition to numbers of hospitals and schools, the healthcare and education service levels were also affected by populations and per capita income. Additionally, other factors may also have influence on the service levels but were not taken into account in this study because of limited time and data. education service healthcare service Kernel Density Estimation mapping mapping Illinois k-12 schools mapping techniques compared

Search results