1 |
Topics in measurement error and missing data problemsLiu, Lian 15 May 2009 (has links)
No description available.
|
2 |
Topics in measurement error and missing data problemsLiu, Lian 15 May 2009 (has links)
No description available.
|
3 |
Secondary Analysis of Case-Control Studies in Genomic ContextsWei, Jiawei 2010 August 1900 (has links)
This dissertation consists of five independent projects. In each project, a novel
statistical method was developed to address a practical problem encountered in genomic
contexts. For example, we considered testing for constant nonparametric effects
in a general semiparametric regression model in genetic epidemiology; analyzed the
relationship between covariates in the secondary analysis of case-control data; performed
model selection in joint modeling of paired functional data; and assessed the
prediction ability of genes in gene expression data generated by the CodeLink System
from GE.
In the first project in Chapter II we considered the problem of testing for constant
nonparametric effects in a general semiparametric regression model when there is the
potential for interaction between the parametrically and nonparametrically modeled
variables. We derived a generalized likelihood ratio test for this hypothesis, showed
how to implement it, and gave evidence that it can improve statistical power when
compared to standard partially linear models.
The second project in Chapter III addressed the issue of score testing for the
independence of X and Y in the second analysis of case-control data. The semiparametric
efficient approaches can be used to construct semiparametric score tests, but
they suffer from a lack of robustness to the assumed model for Y given X. We showed
how to adjust the semiparametric score test to make its level/Type I error correct even if the assumed model for Y given X is incorrect, and thus the test is robust.
The third project in Chapter IV took up the issue of estimation of a regression
function when Y given X follows a homoscedastic regression model. We showed how
to estimate the regression parameters in a rare disease case even if the assumed model
for Y given X is incorrect, and thus the estimates are model-robust.
In the fourth project in Chapter V we developed novel AIC and BIC-type methods
for estimating the smoothing parameters in a joint model of paired, hierarchical
sparse functional data, and showed in our numerical work that they are many times
faster than 10-fold crossvalidation while at the same time giving results that are
remarkably close to the crossvalidated estimates.
In the fifth project in Chapter VI we introduced a practical permutation test
that uses cross-validated genetic predictors to determine if the list of genes in question
has “good” prediction ability. It avoids overfitting by using cross-validation to
derive the genetic predictor and determines if the count of genes that give “good”
prediction could have been obtained by chance. This test was then used to explore
gene expression of colonic tissue and exfoliated colonocytes in the fecal stream to
discover similarities between the two.
|
4 |
Sensitivity Analysis of Untestable Assumptions in Causal InferenceLundin, Mathias January 2011 (has links)
This thesis contributes to the research field of causal inference, where the effect of a treatment on an outcome is of interest is concerned. Many such effects cannot be estimated through randomised experiments. For example, the effect of higher education on future income needs to be estimated using observational data. In the estimation, assumptions are made to make individuals that get higher education comparable with those not getting higher education, to make the effect estimable. Another assumption often made in causal inference (both in randomised an nonrandomised studies) is that the treatment received by one individual has no effect on the outcome of others. If this assumption is not met, the meaning of the causal effect of the treatment may be unclear. In the first paper the effect of college choice on income is investigated using Swedish register data, by comparing graduates from old and new Swedish universities. A semiparametric method of estimation is used, thereby relaxing functional assumptions for the data. One assumption often made in causal inference in observational studies is that individuals in different treatment groups are comparable, given that a set of pretreatment variables have been adjusted for in the analysis. This so called unconfoundedness assumption is in principle not possible to test and, therefore, in the second paper we propose a Bayesian sensitivity analysis of the unconfoundedness assumption. This analysis is then performed on the results from the first paper. In the third paper of the thesis, we study profile likelihood as a tool for semiparametric estimation of a causal effect of a treatment. A semiparametric version of the Bayesian sensitivity analysis of the unconfoundedness assumption proposed in Paper II is also performed using profile likelihood. The last paper of the thesis is concerned with the estimation of direct and indirect causal effects of a treatment where interference between units is present, i.e., where the treatment of one individual affects the outcome of other individuals. We give unbiased estimators of these direct and indirect effects for situations where treatment probabilities vary between individuals. We also illustrate in a simulation study how direct and indirect causal effects can be estimated when treatment probabilities need to be estimated using background information on individuals.
|
5 |
Model choice and variable selection in mixed & semiparametric modelsSäfken, Benjamin 27 March 2015 (has links)
No description available.
|
6 |
Statistical comparisons for nonlinear curves and surfacesZhao, Shi 31 May 2018 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Estimation of nonlinear curves and surfaces has long been the focus of semiparametric
and nonparametric regression. The advances in related model fitting methodology
have greatly enhanced the analyst’s modeling flexibility and have led to scientific discoveries
that would be otherwise missed by the traditional linear model analysis. What has
been less forthcoming are the testing methods concerning nonlinear functions, particularly
for comparisons of curves and surfaces. Few of the existing methods are carefully
disseminated, and most of these methods are subject to important limitations. In the
implementation, few off-the-shelf computational tools have been developed with syntax
similar to the commonly used model fitting packages, and thus are less accessible to
practical data analysts. In this dissertation, I reviewed and tested the existing methods
for nonlinear function comparison, examined their operational characteristics. Some
theoretical justifications were provided for the new testing procedures. Real data exampleswere
included illustrating the use of the newly developed software. A new R package
and a more user-friendly interface were created for enhanced accessibility. / 2020-08-22
|
7 |
Zonal And Regional Load Forecasting In The New England Wholesale Electricity Market: A Semiparametric Regression ApproachFarland, Jonathan 01 January 2013 (has links) (PDF)
Power system planning, reliability analysis and economically efficient capacity scheduling all rely heavily on electricity demand forecasting models. In the context of a deregulated wholesale electricity market, using scheduling a region’s bulk electricity generation is inherently linked to future values of demand. Predictive models are used by municipalities and suppliers to bid into the day-ahead market and by utilities in order to arrange contractual interchanges among neighboring utilities. These numerical predictions are therefore pervasive in the energy industry.
This research seeks to develop a regression-based forecasting model. Specifically, electricity demand is modeled as a function of calendar effects, lagged demand effects, weather effects, and a stochastic disturbance. Variables such as temperature, wind speed, cloud cover and humidity are known to be among the strongest predictors of electricity demand and as such are used as model inputs. It is well known, however, that the relationship between demand and weather can be highly nonlinear. Rather than assuming a linear functional form, the structural change in these relationships is explored. Those variables that indicate a nonlinear relationship with demand are accommodated with penalized splines in a semiparametric regression framework. The equivalence between penalized splines and the special case of a mixed model formulation allows for model estimation with currently available statistical packages such as R, STATA and SAS.
Historical data are available for the entire New England region as well as for the smaller zones that collectively make up the regional grid. As such, a secondary research objective of this thesis is to explore whether or not an aggregation of zonal forecasts might perform better than those produced from a single regional model. Prior to this research, neither the applicability of a semiparametric regression-based approach towards load forecasting nor the potential improvement in forecasting performance resulting from zonal load forecasting has been investigated for the New England wholesale electricity market.
|
8 |
Semiparametric Techniques for Response Surface MethodologyPickle, Stephanie M. 14 September 2006 (has links)
Many industrial statisticians employ the techniques of Response Surface Methodology (RSM) to study and optimize products and processes. A second-order Taylor series approximation is commonly utilized to model the data; however, parametric models are not always adequate. In these situations, any degree of model misspecification may result in serious bias of the estimated response. Nonparametric methods have been suggested as an alternative as they can capture structure in the data that a misspecified parametric model cannot. Yet nonparametric fits may be highly variable especially in small sample settings which are common in RSM. Therefore, semiparametric regression techniques are proposed for use in the RSM setting. These methods will be applied to an elementary RSM problem as well as the robust parameter design problem. / Ph. D.
|
9 |
Efficient inference in general semiparametric regression modelsMaity, Arnab 15 May 2009 (has links)
Semiparametric regression has become very popular in the field of Statistics over the
years. While on one hand more and more sophisticated models are being developed,
on the other hand the resulting theory and estimation process has become more and
more involved. The main problems that are addressed in this work are related to
efficient inferential procedures in general semiparametric regression problems.
We first discuss efficient estimation of population-level summaries in general semiparametric
regression models. Here our focus is on estimating general population-level
quantities that combine the parametric and nonparametric parts of the model (e.g.,
population mean, probabilities, etc.). We place this problem in a general context,
provide a general kernel-based methodology, and derive the asymptotic distributions
of estimates of these population-level quantities, showing that in many cases the estimates
are semiparametric efficient.
Next, motivated from the problem of testing for genetic effects on complex traits in
the presence of gene-environment interaction, we consider developing score test in
general semiparametric regression problems that involves Tukey style 1 d.f form of
interaction between parametrically and non-parametrically modeled covariates. We
develop adjusted score statistics which are unbiased and asymptotically efficient and
can be performed using standard bandwidth selection methods. In addition, to over come the difficulty of solving functional equations, we give easy interpretations of the
target functions, which in turn allow us to develop estimation procedures that can be
easily implemented using standard computational methods.
Finally, we take up the important problem of estimation in a general semiparametric
regression model when covariates are measured with an additive measurement error
structure having normally distributed measurement errors. In contrast to methods
that require solving integral equation of dimension the size of the covariate measured
with error, we propose methodology based on Monte Carlo corrected scores to estimate
the model components and investigate the asymptotic behavior of the estimates.
For each of the problems, we present simulation studies to observe the performance of
the proposed inferential procedures. In addition, we apply our proposed methodology
to analyze nontrivial real life data sets and present the results.
|
10 |
Modern Monte Carlo Methods and Their Application in Semiparametric RegressionThomas, Samuel Joseph 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The essence of Bayesian data analysis is to ascertain posterior distributions. Posteriors
generally do not have closed-form expressions for direct computation in practical applications.
Analysts, therefore, resort to Markov Chain Monte Carlo (MCMC) methods for the generation
of sample observations that approximate the desired posterior distribution. Standard MCMC
methods simulate sample values from the desired posterior distribution via random proposals.
As a result, the mechanism used to generate the proposals inevitably determines the
efficiency of the algorithm. One of the modern MCMC techniques designed to explore
the high-dimensional space more efficiently is Hamiltonian Monte Carlo (HMC), based on
the Hamiltonian differential equations. Inspired by classical mechanics, these equations
incorporate a latent variable to generate MCMC proposals that are likely to be accepted.
This dissertation discusses how such a powerful computational approach can be used for
implementing statistical models. Along this line, I created a unified computational procedure
for using HMC to fit various types of statistical models. The procedure that I proposed can
be applied to a broad class of models, including linear models, generalized linear models,
mixed-effects models, and various types of semiparametric regression models. To facilitate
the fitting of a diverse set of models, I incorporated new parameterization and decomposition
schemes to ensure the numerical performance of Bayesian model fitting without sacrificing
the procedure’s general applicability. As a concrete application, I demonstrate how to use the
proposed procedure to fit a multivariate generalized additive model (GAM), a nonstandard
statistical model with a complex covariance structure and numerous parameters. Byproducts of the research include two software packages that all practical data analysts to use the
proposed computational method to fit their own models. The research’s main methodological
contribution is the unified computational approach that it presents for Bayesian model
fitting that can be used for standard and nonstandard statistical models. Availability of
such a procedure has greatly enhanced statistical modelers’ toolbox for implementing new
and nonstandard statistical models.
|
Page generated in 0.1212 seconds