21 |
Efficacy of robust regression applied to fractional factorial treatment structures.McCants, Michael January 1900 (has links)
Master of Science / Department of Statistics / James J. Higgins / Completely random and randomized block designs involving n factors at each of two levels are used to screen for the effects of a large number of factors. With such designs it may not be possible either because of costs or because of time to run each treatment combination more than once. In some cases, only a fraction of all the treatments may be run. With a large number of factors and limited observations, even one outlier can adversely affect the results. Robust regression methods are designed to down-weight the adverse affects of outliers. However, to our knowledge practitioners do not routinely apply robust regression methods in the context of fractional replication of 2^n factorial treatment structures. The purpose of this report is examine how robust regression methods perform in this context.
|
22 |
Individual mediating effects and the concept of terminal measures dataSerasinghe, Roshan Niranjala January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary Gadbury / Researches in the fields in science and statistics often go beyond the two-variable cause-and-effect relationship, and also try to understand what connects the causal relationship and what changes the magnitude or direction of the causal relationship between two variables, predictor(T) and outcome (Y).
A mediator (Z) is a third variable that links a cause and an effect, whereby T causes the Z and Z causes Y. In general, a given variable may be said to function as a mediator to the extent that it accounts for the relation between the predictor and the outcome (Baron and Kenny, 1986).
The initial question regards the appropriate characterization of a mediation effect. Most studies, when comparing one or more treatments focus on an average mediating effect. This average mediating effect can be misleading when the mediating effects vary from subject to subject in the population. The primary focus of this research is to investigate individual mediating effects in a population, and to define a variance of these individual mediating effects. A concept called subject-mediator (treatment) interaction is presented and its role in evaluating a mediator’s behavior on a population of units is studied. This is done using a framework sometimes called a counterfactual model. Some common experimental designs that provide different knowledge about this interaction term are studied. The subgroup analysis is the most common analytic approach for examining heterogeneity of mediating effects.
In mediation analysis, situations can arise where Z and Y cannot both be measured on an individual unit. We refer to such data as terminal measures data. We show a design where a mediating effect cannot be estimated in terminal measures data and another one where it can be, with an assumption. The assumption is linked to the idea of pseudo-replication. These ideas are discussed and a simulation study illustrates the issues involved when analyzing terminal measures data. We know of no methods that are currently available that specifically address terminal measures data.
|
23 |
A simulation comparison of two methods for controlling the experiment-wise Type I error rate of correlated tests for contrasts in one-way completely randomized designsJiao, Yuanfang January 1900 (has links)
Master of Science / Department of Statistics / Paul I. Nelson / A Bonferroni and an ordered P-value solution to the problem of controlling the experiment-wise Type I error rate are studied and compared in terms of actual size and
power when carrying out correlated tests. Although both of these solutions can be used in a wide variety of settings, here they are only investigated in the context of multiple
testing that specified pairwise comparisons of means, selected before data are collected,
are all equal to zero in a completely randomized, balanced, one factor design where the
data are independent random samples from normal distributions all having the same
variance. Simulations indicate that both methods are very similar and effective in controlling experiment wise type error at a nominal rate of 0.05. Because the ordered P-value method has, almost uniformly, slightly greater power, it is my recommendation for
use in the setting of this report.
|
24 |
Nonparametric tests for longitudinal dataDong, Lei January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / The purpose of this report is to numerically compare several tests that are applicable to longitudinal data when the experiment contains a large number of treatments or experimental conditions. Such data are increasingly common as technology advances. Of interest is to evaluate if there is any significant main effect of treatment or time, and their interactions. Traditional methods such as linear mixed-effects models (LME), generalized estimating equations (GEE), Wilks' lambda, Hotelling-Lawley, and Pillai's multivariate tests were developed under either parametric distributional assumptions or the assumption of large number of replications. A few recent tests, such as Zhang (2008), Bathke & Harrar (2008), and Bathke & Harrar (2008) were specially developed for the setting of large number of treatments with possibly small replications. In this report, I will present some numerical studies regarding these tests. Performance of these tests will be presented for data generated from several distributions.
|
25 |
Stochastic Mortality ModellingLiu, Xiaoming 28 July 2008 (has links)
For life insurance and annuity products whose payoffs depend on the future mortality rates, there is a risk that realized
mortality rates will be different from the anticipated rates
accounted for in their pricing and reserving calculations. This is
termed as mortality risk. Since mortality risk is difficult to
diversify and has significant financial impacts on insurance
policies and pension plans, it is now a well-accepted fact that
stochastic approaches shall be adopted to model the mortality risk
and to evaluate the mortality-linked securities.
The objective of this thesis is to propose the use of a
time-changed Markov process to describe stochastic mortality
dynamics for pricing and risk management purposes. Analytical and
empirical properties of this dynamics have been investigated using
a matrix-analytic methodology. Applications of the proposed model
in the evaluation of fair values for mortality linked securities
have also been explored.
To be more specific, we consider a finite-state Markov process
with one absorbing state. This Markov process is related to an
underlying aging mechanism and the survival time is viewed as the
time until absorption. The resulting distribution for the survival
time is a so-called phase-type distribution. This approach is
different from the traditional curve fitting mortality models in
the sense that the survival probabilities are now linked with an
underlying Markov aging process. Markov mathematical and
phase-type distribution theories therefore provide us a flexible
and tractable framework to model the mortality dynamics. And the
time-changed Markov process allows us to incorporate the
uncertainties embedded in the future mortality evolution.
The proposed model has been applied to price the EIB/BNP Longevity
Bonds and other mortality derivatives under the independent
assumption of interest rate and mortality rate. A calibrating
method for the model is suggested so that it can utilize both the
market price information involving the relevant mortality risk and
the latest mortality projection. The proposed model has also been
fitted to various type of population mortality data for empirical
study. The fitting results show that our model can interpret the
stylized mortality patterns very well.
|
26 |
Some limit behaviors for the LS estimators in errors-in-variables regression modelChen, Shu January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / There has been a continuing interest among statisticians in the problem of regression models wherein the independent variables are measured with error and there is considerable literature on the subject. In the following report, we discuss the errors-in-variables regression model: yi = β0 + β1xi + β2zi + ϵi,Xi = xi + ui,Zi = zi + vi with i.i.d. errors (ϵi, ui, vi), for
i = 1, 2, ..., n and find the least square estimators for the parameters of interest. Both weak and strong consistency for the least square estimators βˆ0, βˆ1, and βˆ2 of the unknown parameters β0, β1, and β2 are obtained. Moreover, under regularity conditions, the asymptotic normalities of the estimators are reported.
|
27 |
The robustness of confidence intervals for effect size in one way designs with respect to departures from normalityHembree, David January 1900 (has links)
Master of Science / Department of Statistics / Paul Nelson / Effect size is a concept that was developed to bridge the gap between practical and statistical significance. In the context of completely randomized one way designs, the setting considered here, inference for effect size has only been developed under normality. This report is a simulation study investigating the robustness of nominal 0.95 confidence intervals for effect size with respect to departures from normality in terms of their coverage rates and lengths. In addition to the normal distribution, data are generated from four non-normal distributions: logistic, double exponential, extreme value, and uniform.
The report discovers that the coverage rates of the logistic, double exponential, and
extreme value distributions drop as effect size increases, while, as expected, the coverage rate of the normal distribution remains very steady at 0.95. In an interesting turn of events, the uniform
distribution produced higher than 0.95 coverage rates, which increased with effect size. Overall, in the scope of the settings considered, normal theory confidence intervals for effect size are robust for small effect size and not robust for large effect size. Since the magnitude of effect size is typically not known, researchers are advised to investigate the assumption of normality before constructing normal theory confidence intervals for effect size.
|
28 |
Robust linear regressionBai, Xue January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the regression parameter estimates. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust regression methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, [tau]-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator.
|
29 |
Semi-parametric estimation in Tobit regression modelsChen, Chunxia January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / In the classical Tobit regression model, the regression error term is often assumed to have a zero mean normal distribution with unknown variance, and the regression function is assumed to be linear. If the normality assumption is violated, then the commonly used maximum likelihood estimate becomes inconsistent. Moreover, the likelihood function will be very complicated if the regression function is nonlinear even the error density is normal, which makes the maximum likelihood estimation procedure hard to implement. In the full nonparametric setup when both the regression function and the distribution of the error term [epsilon] are unknown, some nonparametric estimators for the regression function has been proposed. Although the assumption of knowing the distribution is strict, it is a widely adopted assumption in Tobit regression literature, and is also confirmed by many empirical studies conducted in the econometric research. In fact, a majority of the relevant research assumes that [epsilon] possesses a normal distribution with mean 0 and unknown standard deviation. In this report, we will try to develop a semi-parametric estimation procedure for the regression function by assuming that the error term follows a distribution from a class of 0-mean symmetric location and scale family. A minimum distance estimation procedure for estimating the parameters in the regression function when it has a specified parametric form is also constructed. Compare with the existing semiparametric and nonparametric methods in the literature, our method would be more efficient in that more information, in particular the knowledge of the distribution of [epsilon], is used. Moreover, the computation is relative inexpensive. Given lots of application does assume that [epsilon] has normal or other known distribution, the current work no doubt provides some more practical tools for statistical inference in Tobit regression model.
|
30 |
Model adequacy tests for exponential family regression modelsMagalla, Champa Hemanthi January 1900 (has links)
Doctor of Philosophy / Department of Statistics / James Neill / The problem of testing for lack of fit in exponential family regression models is considered. Such nonlinear models are the natural extension of Normal nonlinear regression models and generalized linear models. As is usually the case, inadequately specified models have an adverse impact on statistical inference and scientific discovery. Models of interest are curved exponential families determined by a sequence of predictor settings and mean regression function, considered as a sub-manifold of the full exponential family. Constructed general alternative models are based on clusterings in the mean parameter components and allow likelihood ratio testing for lack of fit associated with the mean, equivalently natural parameter, for a proposed null model. A maximin clustering methodology is defined in this context to determine suitable clusterings for assessing lack of fit. In addition, a geometrically motivated goodness of fit test statistic for exponential family regression based on the information metric is introduced. This statistic is applied to the cases of logistic regression and Poisson regression, and in both cases it can be seen to be equal to a form of the Pearson chi[superscript]2 statistic. This same statement is true for multinomial regression. In addition, the problem of testing for equal means in a heteroscedastic Normal model is discussed. In particular, a saturated 3 parameter exponential family model is developed which allows for equal means testing with unequal variances. A simulation study was carried out for the logistic and Poisson regression models to investigate comparative performance of the likelihood ratio test, the deviance test and the goodness of fit test based on the information metric. For logistic regression, the Hosmer-Lemeshow test was also included in the simulations. Notably, the likelihood ratio test had comparable power with that of the Hosmer-Lemeshow test under both m- and n-asymptotics, with superior power for constructed alternatives. A distance function defined between densities and based on the information metric is also given. For logistic models, as the natural parameters go to plus or minus infinity, the densities become more and more deterministic and limits of this distance function are shown to play an important role in the lack of fit analysis. A further simulation study investigated the power of a likelihood ratio test and a geometrically derived test based on the information metric for testing equal means in heteroscedastic Normal models.
|
Page generated in 0.0145 seconds