Global ETD Search

251	Median and Mode Approximation for Skewed Unimodal Continuous Distributions using Taylor Series Expansion Dula, Mark, Mogusu, Eunice, Strasser, Sheryl, Liu, Ying, Zheng, Shimin 06 April 2016 (has links) Background: Measures of central tendency are one of the foundational concepts of statistics, with the most commonly used measures being mean, median, and mode. While these are all very simple to calculate when data conform to a unimodal symmetric distribution, either discrete or continuous, measures of central tendency are more challenging to calculate for data distributed asymmetrically. There is a gap in the current statistical literature on computing median and mode for most skewed unimodal continuous distributions. For example, for a standardized normal distribution, mean, median, and mode are all equal to 0. The mean, median, and mode are all equal to each other. For a more general normal distribution, the mode and median are still equal to the mean. Unfortunately, the mean is highly affected by extreme values. If the distribution is skewed either positively or negatively, the mean is pulled in the direction of the skew; however, the median and mode are more robust statistics and are not pulled as far as the mean. The traditional response is to provide an estimate of the median and mode as current methodological approaches are limited in determining their exact value once the mean is pulled away. Methods: The purpose of this study is to test a new statistical method, utilizing the first order and second order partial derivatives in Taylor series expansion, for approximating the median and mode of skewed unimodal continuous distributions. Specifically, to compute the approximated mode, the first order derivatives of the sum of the first three terms in the Taylor series expansion is set to zero and then the equation is solved to find the unknown. To compute the approximated median, the integration from negative infinity to the median is set to be one half and then the equation is solved for the median. Finally, to evaluate the accuracy of our derived formulae for computing the mode and median of the skewed unimodal continuous distributions, simulation study will be conducted with respect to skew normal distributions, skew t-distributions, skew exponential distributions, and others, with various parameters. Conclusions: The potential of this study may have a great impact on the advancement of current central tendency measurement, the gold standard used in public health and social science research. The study may answer an important question concerning the precision of median and mode estimates for skewed unimodal continuous distributions of data. If this method proves to be an accurate approximation of the median and mode, then it should become the method of choice when measures of central tendency are required. median mode approximation skewed multimodal continuous distributions taylor series expansion Biostatistics and Epidemiology Statistics and Probability
252	Development of generalized index-removal models, with particular attention to catchability issues Ihde, Thomas F. 01 January 2006 (has links) The index-removal method estimates abundance, exploitation and catchability coefficient, given surveys conducted before and after a known removal. The method assumes a closed population between surveys. Index-removal has seldom been applied due to its strong assumption of constant survey catchabilities. This work generalizes the method to allow multiple years of data to be incorporated, and the assumptions of the original model to be relaxed. If catchability is constant across years, precision can be improved by analyzing multi-year data simultaneously. Two multiple-year models were developed: the first, 1qIR, assumes constant catchability within and among years; the second, 2qIR, allows catchability to change between surveys within years, but assumes survey-specific catchability constant across years. The new models were tested by Monte Carlo simulation then applied to data from two southern rock lobster (Jasus edwardsii) populations. The 1qIR model produced reasonable estimates in one application, but the 2qIR model was required to produce reasonable estimates for the second population. A likelihood ratio test found 1qIR to be the most parsimonious model, even when, the assumption of constant survey catchability appeared to be violated. In that case, diagnostic plots suggested that the 2qIR model provided the most reliable estimates. However, when the constant catchability assumption is tenable, the 1qIR model offers the greatest precision for parameter estimates. Size- and sex-specific heterogeneity of catchability introduces bias in model estimates. Field experiments were performed to test whether the catchability of small lobster was constant for southern rock lobster during two seasons when fishing occurs. No evidence of heterogeneous catchability was observed during the spring. However, significantly more small lobster were caught in control traps and traps seeded with one large adult male lobster than were caught in traps seeded with one large adult female during the summer, when females are preparing to molt and reproduce in Tasmania. Because heterogeneous catchability occurred during the summer, but not the spring, an index of recruitment based on the catch of lobsters one molt size below legal size might be developed for the spring, however, more sampling is needed to resolve the annual timing of sex- and size-specific catchability changes. Environmental Sciences Forest Sciences Fresh Water Studies Oceanography Statistics and Probability Zoology
253	Bayesian surface smoothing under anisotropy Chakravarty, Subhashish 01 January 2007 (has links) Bayesian surface smoothing using splines usually proceeds by choosing the smoothness parameter through the use of data driven methods like generalized cross validation. In this methodology, knots of the splines are assumed to lie at the data locations. When anisotropy is present in the data, modeling is done via parametric functions. In the present thesis, we have proposed a non-parametric approach to Bayesian surface smoothing in the presence of anisotropy. We use eigenfunctions generated by thin-plate splines as our basis functions. Using eigenfunctions does away with having to place knots arbitrarily, as is done customarily. The smoothing parameter, the anisotropy matrix, and other parameters are simultaneously updated by a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampler. Unique in our implementation is model selection, which is again done concurrently with the parameter updates. Since the posterior distribution of the coefficients of the basis functions for any given model order is available in closed form, we are able to simplify the sampling algorithm in the model selection step. This also helps us in isolating the parameters which influence the model selection step. We investigate the relationship between the number of basis functions used in the model and the smoothness parameter and find that there is a delicate balance which exists between the two. Higher values of the smoothness parameter correspond to more number of basis functions being selected. Use of a non-parametric approach to Bayesian surface smoothing provides for more modeling flexibility. We are not constrained by the shape defined by a parametric shape of the covariance as used by earlier methods. A Bayesian approach also allows us to include the results obtained from previous analysis of the same data, if any, as prior information. It also allows us to evaluate pointwise estimates of variability of the fitted surface. We believe that our research also poses many questions for future research. Statistics and Probability
254	Partly parametric generalized additive model Zhang, Tianyang 01 December 2010 (has links) In many scientific studies, the response variable bears a generalized nonlinear regression relationship with a certain covariate of interest, which may, however, be confounded by other covariates with unknown functional form. We propose a new class of models, the partly parametric generalized additive model (PPGAM) for doing generalized nonlinear regression with the confounding covariate effects adjusted nonparametrically. To avoid the curse of dimensionality, the PPGAM specifies that, conditional on the covariates, the response distribution belongs to the exponential family with the mean linked to an additive predictor comprising a nonlinear parametric function that is of main interest, plus additive, smooth functions of other covariates. The PPGAM extends both the generalized additive model (GAM) and the generalized nonlinear regression model. We propose to estimate a PPGAM by the method of penalized likelihood. We derive some asymptotic properties of the penalized likelihood estimator, including consistency and asymptotic normality of the parametric estimator of the nonlinear regression component. We propose a model selection criterion for the PPGAM, which resembles the BIC. We illustrate the new methodologies by simulations and real applications. We have developed an R package PPGAM that implements the methodologies expounded herein. Ecological Exponential families GAM Nonlinear Regression Penalized Likelihood Semiparametric Statistics and Probability
255	Improved interval estimation of comparative treatment effects Van Krevelen, Ryne Christian 01 May 2015 (has links) Comparative experiments, in which subjects are randomized to one of two treatments, are performed often. There is no shortage of papers testing whether a treatment effect exists and providing confidence intervals for the magnitude of this effect. While it is well understood that the object and scope of inference for an experiment will depend on what assumptions are made, these entities are not always clearly presented. We have proposed one possible method, which is based on the ideas of Jerzy Neyman, that can be used for constructing confidence intervals in a comparative experiment. The resulting intervals, referred to as Neyman-type confidence intervals, can be applied in a wide range of cases. Special care is taken to note which assumptions are made and what object and scope of inference are being investigated. We have presented a notation that highlights which parts of a problem are being treated as random. This helps ensure the focus on the appropriate scope of inference. The Neyman-type confidence intervals are compared to possible alternatives in two different inference settings: one in which inference is made about the units in the sample and one in which inference is made about units in a fixed population. A third inference setting, one in which inference is made about a process distribution, is also discussed. It is stressed that certain assumptions underlying this third type of inference are unverifiable. When these assumptions are not met, the resulting confidence intervals may cover their intended target well below the desired rate. Through simulation, we demonstrate that the Neyman-type intervals have good coverage properties when inference is being made about a sample or a population. In some cases the alternative intervals are much wider than necessary on average. Therefore, we recommend that researchers consider using our Neyman-type confidence intervals when carrying out inference about a sample or a population as it may provide them with more precise intervals that still cover at the desired rate. publicabstract confidence interval randomization-based confidence interval randomized experiment treatment effect Statistics and Probability
256	Issues in Bayesian Gaussian Markov random field models with application to intersensor calibration Liang, Dong 01 December 2009 (has links) A long term record of the earth's vegetation is important in studies of global climate change. Over the last three decades, multiple data sets on vegetation have been collected using different satellite-based sensors. There is a need for methods that combine these data into a long term earth system data record. The Advanced Very High Resolution Radiometer (AVHRR) has provided reflectance measures of the entire earth since 1978. Physical and statistical models have been used to improve the consistency and reliability of this record. The Moderated Resolution Imaging Spectroradiometer (MODIS) has provided measurements with superior radiometric properties and geolocation accuracy. However, this record is available only since 2000. In this thesis, we perform statistical calibration of AVHRR to MODIS. We aim to: (1) fill in gaps in the ongoing MODIS record; (2) extend MODIS values back to 1982. We propose Bayesian mixed models to predict MODIS values using snow cover and AVHRR values as covariates. Random effects are used to account for spatiotemporal correlation in the data. We estimate the parameters based on the data after 2000, using Markov chain Monte Carlo methods. We then back-predict MODIS data between 1978 and 1999, using the posterior samples of the parameter estimates. We develop new Conditional Autoregressive (CAR) models for seasonal data. We also develop new sampling methods for CAR models. Our approach enables filling in gaps in the MODIS record and back-predicting these values to construct a consistent historical record. The Bayesian framework incorporates multiple sources of variation in estimating the accuracy of the obtained data. The approach is illustrated using vegetation data over a region in Minnesota. Gaussian Markov Random Fields Kronecker Product Slice Sampling Space Time Statistics and Probability
257	Statistical inference of distributed delay differential equations Zhou, Ziqian 01 August 2016 (has links) In this study, we aim to develop new likelihood based method for estimating parameters of ordinary differential equations (ODEs) / delay differential equations (DDEs) models. Those models are important for modeling dynamical processes that are described in terms of their derivatives and are widely used in many fields of modern science, such as physics, chemistry, biology and social sciences. We use our new approach to study a distributed delay differential equation model, the statistical inference of which has been unexplored, to our knowledge. Estimating a distributed DDE model or ODE model with time varying coefficients results in a large number of parameters. We also apply regularization for efficient estimation of such models. We assess the performance of our new approaches using simulation and applied them to analyzing data from epidemiology and ecology. Delay Differential Equation Epidemiology Generalized Profiling SPARSE REGULARIZATION Time Varying Parameters Statistics and Probability
258	Quantitative analysis of extreme risks in insurance and finance Yuan, Zhongyi 01 May 2013 (has links) In this thesis, we aim at a quantitative understanding of extreme risks. We use heavy-tailed distribution functions to model extreme risks, and use various tools, such as copulas and MRV, to model dependence structures. We focus on modeling as well as quantitatively estimating certain measurements of extreme risks. We start with a credit risk management problem. More specifically, we consider a credit portfolio of multiple obligors subject to possible default. We propose a new structural model for the loss given default, which takes into account the severity of default. Then we study the tail behavior of the loss given default under the assumption that the losses of the obligors jointly follow an MRV structure. This structure provides an ideal framework for modeling both heavy tails and asymptotic dependence. Using HRV, we also accommodate the asymptotically independent case. Multivariate models involving Archimedean copulas, mixtures and linear transforms are revisited. We then derive asymptotic estimates for the Value at Risk and Conditional Tail Expectation of the loss given default and compare them with the traditional empirical estimates. Next, we consider an investor who invests in multiple lines of business and study a capital allocation problem. A randomly weighted sum structure is proposed, which can capture both the heavy-tailedness of losses and the dependence among them, while at the same time separates the magnitudes from dependence. To pursue as much generality as possible, we do not impose any requirement on the dependence structure of the random weights. We first study the tail behavior of the total loss and obtain asymptotic formulas under various sets of conditions. Then we derive asymptotic formulas for capital allocation and further refine them to be explicit for some cases. Finally, we conduct extreme risk analysis for an insurer who makes investments. We consider a discrete-time risk model in which the insurer is allowed to invest a proportion of its wealth in a risky stock and keep the rest in a risk-free bond. Assume that the claim amounts within individual periods follow an autoregressive process with heavy-tailed innovations and that the log-returns of the stock follow another autoregressive process, independent of the former one. We derive an asymptotic formula for the finite-time ruin probability and propose a hybrid method, combining simulation with asymptotics, to compute this ruin probability more efficiently. As an application, we consider a portfolio optimization problem in which we determine the proportion invested in the risky stock that maximizes the expected terminal wealth subject to a constraint on the ruin probability. Asymptotic dependence Asymptotics Capital allocation Copula Heavy-tailed distribution Multivariate regular variation Statistics and Probability
259	Probabilistic Modeling of Lava Flows: A Hazard Assessment for the San Francisco Volcanic Field, Arizona Harburger, Aleeza 07 March 2014 (has links) This study serves as a first step towards a comprehensive hazard assessment for the San Francisco volcanic field in northern Arizona, which can be applied to local response plans and educational initiatives. The primary goal of this thesis is to resolve the conditional probability that, given a lava flow effusing from a new vent in the San Francisco volcanic field, it will inundate the city limits of Flagstaff. The spatial distribution of vents within the San Francisco volcanic field was analyzed in order to execute a lava flow simulation to determine the inundation hazard to Flagstaff. The Gaussian kernel function for estimating spatial density showed that there is a 99% chance that a future vent will be located within a 3.6 x 109 m2 area about 20 kilometers north of Flagstaff. This area contains the location of the most recent eruption at Sunset Crater, suggesting that the model is a good predictor of future vent locations. A Monte Carlo analysis of potential vent locations (N = 7,769) showed that 3.5% of simulated vents generated lava flows that inundated Flagstaff, and 1.1% of simulated vents were located within the city limits. Based on the average recurrence rate of vents formed during the Brunhes chronozone, the aggregate probability of lava flow inundation in Flagstaff is 1.1 x 10-5 per year. This suggests that there is a need for the city to plan for lava flows and associated hazards, especially forest fires. Even though it is unlikely that the city will ever have to utilize such a plan, it is imperative that thorough mitigation and response plans are established now-- before the onset of renewed volcanic activity. Flagstaff Monogenetic fields Monte Carlo simulation Natural disaster Spatial density Geology Land Use Law Statistics and Probability
260	A Study of Four Statistics, Used in Analysis of Contingency Tables, in the Presence of Low Expected Frequencies Post, Jane R. 01 May 1975 (has links) Four statistics used for the analysis of categorical data were observed in the presence of many zero cell frequencies in two way classification contingency tables. The purpose of this study was to determine the effect of many zero cell frequencies upon the distribution properties of each of the four statistics studied. It was found that Light and Margolin's C and Pearson's Chi-square statistic closely approximated the Chi-square distribution as long as less than one-third of the table cells were empty. It was found that the mean and variance of Kullbach's 21 were larger than the expected values in the presence of few empty cells. The mean for 21 was found to become small in the presence of large numbers of empty cells. Ku's corrected 21 statistic was found, in the presence of many zero cell frequencies, to have a much larger mean value than would be expected in a Chi-square distribution. Kullback's 21 demonstrated a peculiar distribution change in the presence of large numbers of zero cell frequencies. 21 first enlarged, then decreased in average value. four statistics contingency tables low expected frequencies Applied Statistics Statistics and Probability

Search results