1 |
A score test of homogeneity in generalized additive models for zero-inflated count dataNian, Gaowei January 1900 (has links)
Master of Science / Department of Statistics / Wei-Wen Hsu / Zero-Inflated Poisson (ZIP) models are often used to analyze the count data with excess zeros. In the ZIP model, the Poisson mean and the mixing weight are often assumed to depend on covariates through regression technique. In other words, the effect of covariates on Poisson mean or the mixing weight is specified using a proper link function coupled with a linear predictor which is simply a linear combination of unknown regression coefficients and covariates. However, in practice, this predictor may not be linear in regression parameters but curvilinear or nonlinear. Under such situation, a more general and flexible approach should be considered. One popular method in the literature is Zero-Inflated Generalized Additive Models (ZIGAM) which extends the zero-inflated models to incorporate the use of Generalized Additive Models (GAM). These models can accommodate the nonlinear predictor in the link function. For ZIGAM, it is also of interest to conduct inferences for the mixing weight, particularly evaluating whether the mixing weight equals to zero. Many methodologies have been proposed to examine this question, but all of them are developed under classical zero-inflated models rather than ZIGAM. In this report, we propose a generalized score test to evaluate whether the mixing weight is equal to zero under the framework of ZIGAM with Poisson model. Technically, the proposed score test is developed based on a novel transformation for the mixing weight coupled with proportional constraints on ZIGAM, where it assumes that the smooth components of covariates in both the Poisson mean and the mixing weight have proportional relationships. An intensive simulation study indicates that the proposed score test outperforms the other existing tests when the mixing weight and the Poisson mean truly involve a nonlinear predictor. The recreational fisheries data from the Marine Recreational Information Program (MRIP) survey conducted by National Oceanic and Atmospheric Administration (NOAA) are used to illustrate the proposed methodology.
|
2 |
Secondary Analysis of Case-Control Studies in Genomic ContextsWei, Jiawei 2010 August 1900 (has links)
This dissertation consists of five independent projects. In each project, a novel
statistical method was developed to address a practical problem encountered in genomic
contexts. For example, we considered testing for constant nonparametric effects
in a general semiparametric regression model in genetic epidemiology; analyzed the
relationship between covariates in the secondary analysis of case-control data; performed
model selection in joint modeling of paired functional data; and assessed the
prediction ability of genes in gene expression data generated by the CodeLink System
from GE.
In the first project in Chapter II we considered the problem of testing for constant
nonparametric effects in a general semiparametric regression model when there is the
potential for interaction between the parametrically and nonparametrically modeled
variables. We derived a generalized likelihood ratio test for this hypothesis, showed
how to implement it, and gave evidence that it can improve statistical power when
compared to standard partially linear models.
The second project in Chapter III addressed the issue of score testing for the
independence of X and Y in the second analysis of case-control data. The semiparametric
efficient approaches can be used to construct semiparametric score tests, but
they suffer from a lack of robustness to the assumed model for Y given X. We showed
how to adjust the semiparametric score test to make its level/Type I error correct even if the assumed model for Y given X is incorrect, and thus the test is robust.
The third project in Chapter IV took up the issue of estimation of a regression
function when Y given X follows a homoscedastic regression model. We showed how
to estimate the regression parameters in a rare disease case even if the assumed model
for Y given X is incorrect, and thus the estimates are model-robust.
In the fourth project in Chapter V we developed novel AIC and BIC-type methods
for estimating the smoothing parameters in a joint model of paired, hierarchical
sparse functional data, and showed in our numerical work that they are many times
faster than 10-fold crossvalidation while at the same time giving results that are
remarkably close to the crossvalidated estimates.
In the fifth project in Chapter VI we introduced a practical permutation test
that uses cross-validated genetic predictors to determine if the list of genes in question
has “good” prediction ability. It avoids overfitting by using cross-validation to
derive the genetic predictor and determines if the count of genes that give “good”
prediction could have been obtained by chance. This test was then used to explore
gene expression of colonic tissue and exfoliated colonocytes in the fecal stream to
discover similarities between the two.
|
3 |
The impact of misspecification of nuisance parameters on test for homogeneity in zero-inflated Poisson model: a simulation studyGao, Siyu January 1900 (has links)
Master of Science / Department of Statistics / Wei-Wen Hsu / The zero-inflated Poisson (ZIP) model consists of a Poisson model and a degenerate distribution at zero. Under this model, zero counts are generated from two sources, representing a heterogeneity in the population. In practice, it is often interested to evaluate this heterogeneity is consistent with the observed data or not. Most of the existing methodologies to examine this heterogeneity are often assuming that the Poisson mean is a function of nuisance parameters which are simply the coefficients associated with covariates. However, these nuisance parameters can be misspecified when performing these methodologies. As a result, the validity and the power of the test may be affected. Such impact of misspecification has not been discussed in the literature. This report primarily focuses on investigating the impact of misspecification on the performance of score test for homogeneity in ZIP models. Through an intensive simulation study, we find that: 1) under misspecification, the limiting distribution of the score test statistic under the null no longer follows a chi-squared distribution. A parametric bootstrap methodology is suggested to use to find the true null limiting distribution of the score test statistic; 2) the power of the test decreases as the number of covariates in the Poisson mean increases. The test with a constant Poisson mean has the highest power, even compared to the test with a well-specified mean. At last, simulation results are applied to the Wuhan Inpatient Care Insurance data which contain excess zeros.
|
4 |
Score Test and Likelihood Ratio Test for Zero-Inflated Binomial Distribution and Geometric DistributionDai, Xiaogang 01 April 2018 (has links)
The main purpose of this thesis is to compare the performance of the score test and the likelihood ratio test by computing type I errors and type II errors when the tests are applied to the geometric distribution and inflated binomial distribution. We first derive test statistics of the score test and the likelihood ratio test for both distributions. We then use the software package R to perform a simulation to study the behavior of the two tests. We derive the R codes to calculate the two types of error for each distribution. We create lots of samples to approximate the likelihood of type I error and type II error by changing the values of parameters.
In the first chapter, we discuss the motivation behind the work presented in this thesis. Also, we introduce the definitions used throughout the paper. In the second chapter, we derive test statistics for the likelihood ratio test and the score test for the geometric distribution. For the score test, we consider the score test using both the observed information matrix and the expected information matrix, and obtain the score test statistic zO and zI .
Chapter 3 discusses the likelihood ratio test and the score test for the inflated binomial distribution. The main parameter of interest is w, so p is a nuisance parameter in this case. We derive the likelihood ratio test statistics and the score test statistics to test w. In both tests, the nuisance parameter p is estimated using maximum likelihood estimator pˆ. We also consider the score test using both the observed and the expected information matrices.
Chapter 4 focuses on the score test in the inflated binomial distribution. We generate data to follow the zero inflated binomial distribution by using the package R. We plot the graph of the ratio of the two score test statistics for the sample data, zI /zO , in terms of different values of n0, the number of zero values in the sample.
In chapter 5, we discuss and compare the use of the score test using two types of information matrices. We perform a simulation study to estimate the two types of errors when applying the test to the geometric distribution and the inflated binomial distribution. We plot the percentage of the two errors by fixing different parameters, such as the probability p and the number of trials m.
Finally, we conclude by briefly summarizing the results in chapter 6.
|
5 |
The Box-Cox Transformation:A Review曾能芳, Zeng, Neng-Fang Unknown Date (has links)
The use of transformation can usually simplify the analysis of data,
especially when the original observations deviate from the underlying
assumption of linear model. Box-Cox transformation receives much more
attention than others. In this dissertation,. we will review the theory
about the estimation, hypotheses test on transformation parameter and
about the sensitivity of the linear model parameters in Box-Cox
transformation. Monte Carlo simulation is used to study the performance
of the transformations. We also display whether Box-Cox transformation
make the transformed observations satisfy the assumption of linear model
actually.
|
6 |
Tests of random effects in linear and non-linear modelsHäggström Lundevaller, Erling January 2002 (has links)
No description available.
|
7 |
Generalized score tests for missing covariate dataJin, Lei 15 May 2009 (has links)
In this dissertation, the generalized score tests based on weighted estimating equations
are proposed for missing covariate data. Their properties, including the effects
of nuisance functions on the forms of the test statistics and efficiency of the tests,
are investigated. Different versions of the test statistic are properly defined for various
parametric and semiparametric settings. Their asymptotic distributions are also
derived. It is shown that when models for the nuisance functions are correct, appropriate
test statistics can be obtained via plugging the estimates of the nuisance
functions into the appropriate test statistic for the case that the nuisance functions
are known. Furthermore, the optimal test is obtained using the relative efficiency
measure. As an application of the proposed tests, a formal model validation procedure
is developed for generalized linear models in the presence of missing covariates.
The asymptotic distribution of the data driven methods is provided. A simulation
study in both linear and logistic regressions illustrates the applicability and the finite
sample performance of the methodology. Our methods are also employed to analyze
a coronary artery disease diagnostic dataset.
|
8 |
Goodness-of-Fit Test Issues in Generalized Linear Mixed ModelsChen, Nai-Wei 2011 December 1900 (has links)
Linear mixed models and generalized linear mixed models are random-effects models widely applied to analyze clustered or hierarchical data. Generally, random effects are often assumed to be normally distributed in the context of mixed models. However, in the mixed-effects logistic model, the violation of the assumption of normally distributed random effects may result in inconsistency for estimates of some fixed effects and the variance component of random effects when the variance of the random-effects distribution is large. On the other hand, summary statistics used for assessing goodness of fit in the ordinary logistic regression models may not be directly applicable to the mixed-effects logistic models. In this dissertation, we present our investigations of two independent studies related to goodness-of-fit tests in generalized linear mixed models.
First, we consider a semi-nonparametric density representation for the random effects distribution and provide a formal statistical test for testing normality of the random-effects distribution in the mixed-effects logistic models. We obtain estimates of parameters by using a non-likelihood-based estimation procedure. Additionally, we not only evaluate the type I error rate of the proposed test statistic through asymptotic results, but also carry out a bootstrap hypothesis testing procedure to control the inflation of the type I error rate and to study the power performance of the proposed test statistic. Further, the methodology is illustrated by revisiting a case study in mental health.
Second, to improve assessment of the model fit in the mixed-effects logistic models, we apply the nonparametric local polynomial smoothed residuals over within-cluster continuous covariates to the unweighted sum of squares statistic for assessing the goodness-of-fit of the logistic multilevel models. We perform a simulation study to evaluate the type I error rate and the power performance for detecting a missing quadratic or interaction term of fixed effects using the kernel smoothed unweighted sum of squares statistic based on the local polynomial smoothed residuals over x-space. We also use a real data set in clinical trials to illustrate this application.
|
9 |
Identifying Influential Observations in Nonlinear Regression : a focus on parameter estimates and the score testStål, Karin January 2015 (has links)
This thesis contributes to influence analysis in nonlinear regression and in particular the detection of influential observations. The focus is on a regression model with a known mean function, which is nonlinear in its parameters and where the function is chosen according to the knowledge about the process generating the data. The error term in the regression model is assumed to be additive. The main goal of this thesis is to work out diagnostic measures for assessing the influence of observations on various results from a nonlinear regression analysis. The obtained results comprise diagnostic tools for detecting observations that, individually or jointly with some other observations, are influential on the parameter estimates. Moreover, assessing conditional influence, i.e. the influence of an observation conditional on the deletion of another observation, is of interest. This can help to identify influential observations which could be missed due to complex relationships among the observations. Novelties of the proposed diagnostic tools include the possibility to assess influence of observations on a specific parameter estimate and to assess influence of multiple observations. A further emphasis of this thesis is on the observations' influence on the outcome of a hypothesis testing procedure based on Rao's score test. An innovative solution to the problem of visual identification of influential observations regarding the score test statistic obtained in this thesis is the so called added parameter plot. As a complement to the added parameter plot, new diagnostic measures are derived for assessing the influence of single and multiple observations on the score test statistic.
|
10 |
New Score Tests for Genetic Linkage Analysis in a Likelihood FrameworkSong, Yeunjoo E. 12 March 2013 (has links)
No description available.
|
Page generated in 0.0585 seconds