Global ETD Search

1	Topics in measurement error and missing data problems Liu, Lian 15 May 2009 (has links) No description available. Measurement Error Semiparametric Regression Missing Genotype Linkage Disequilibrium Mapping QTL
2	Topics in measurement error and missing data problems Liu, Lian 15 May 2009 (has links) No description available. Measurement Error Semiparametric Regression Missing Genotype Linkage Disequilibrium Mapping QTL
3	Secondary Analysis of Case-Control Studies in Genomic Contexts Wei, Jiawei 2010 August 1900 (has links) This dissertation consists of five independent projects. In each project, a novel statistical method was developed to address a practical problem encountered in genomic contexts. For example, we considered testing for constant nonparametric effects in a general semiparametric regression model in genetic epidemiology; analyzed the relationship between covariates in the secondary analysis of case-control data; performed model selection in joint modeling of paired functional data; and assessed the prediction ability of genes in gene expression data generated by the CodeLink System from GE. In the first project in Chapter II we considered the problem of testing for constant nonparametric effects in a general semiparametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. We derived a generalized likelihood ratio test for this hypothesis, showed how to implement it, and gave evidence that it can improve statistical power when compared to standard partially linear models. The second project in Chapter III addressed the issue of score testing for the independence of X and Y in the second analysis of case-control data. The semiparametric efficient approaches can be used to construct semiparametric score tests, but they suffer from a lack of robustness to the assumed model for Y given X. We showed how to adjust the semiparametric score test to make its level/Type I error correct even if the assumed model for Y given X is incorrect, and thus the test is robust. The third project in Chapter IV took up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model. We showed how to estimate the regression parameters in a rare disease case even if the assumed model for Y given X is incorrect, and thus the estimates are model-robust. In the fourth project in Chapter V we developed novel AIC and BIC-type methods for estimating the smoothing parameters in a joint model of paired, hierarchical sparse functional data, and showed in our numerical work that they are many times faster than 10-fold crossvalidation while at the same time giving results that are remarkably close to the crossvalidated estimates. In the fifth project in Chapter VI we introduced a practical permutation test that uses cross-validated genetic predictors to determine if the list of genes in question has “good” prediction ability. It avoids overfitting by using cross-validation to derive the genetic predictor and determines if the count of genes that give “good” prediction could have been obtained by chance. This test was then used to explore gene expression of colonic tissue and exfoliated colonocytes in the fecal stream to discover similarities between the two. Semiparametric Regression Case-Control Score Test Model Selection Classification
4	Sensitivity Analysis of Untestable Assumptions in Causal Inference Lundin, Mathias January 2011 (has links) This thesis contributes to the research field of causal inference, where the effect of a treatment on an outcome is of interest is concerned. Many such effects cannot be estimated through randomised experiments. For example, the effect of higher education on future income needs to be estimated using observational data. In the estimation, assumptions are made to make individuals that get higher education comparable with those not getting higher education, to make the effect estimable. Another assumption often made in causal inference (both in randomised an nonrandomised studies) is that the treatment received by one individual has no effect on the outcome of others. If this assumption is not met, the meaning of the causal effect of the treatment may be unclear. In the first paper the effect of college choice on income is investigated using Swedish register data, by comparing graduates from old and new Swedish universities. A semiparametric method of estimation is used, thereby relaxing functional assumptions for the data. One assumption often made in causal inference in observational studies is that individuals in different treatment groups are comparable, given that a set of pretreatment variables have been adjusted for in the analysis. This so called unconfoundedness assumption is in principle not possible to test and, therefore, in the second paper we propose a Bayesian sensitivity analysis of the unconfoundedness assumption. This analysis is then performed on the results from the first paper. In the third paper of the thesis, we study profile likelihood as a tool for semiparametric estimation of a causal effect of a treatment. A semiparametric version of the Bayesian sensitivity analysis of the unconfoundedness assumption proposed in Paper II is also performed using profile likelihood. The last paper of the thesis is concerned with the estimation of direct and indirect causal effects of a treatment where interference between units is present, i.e., where the treatment of one individual affects the outcome of other individuals. We give unbiased estimators of these direct and indirect effects for situations where treatment probabilities vary between individuals. We also illustrate in a simulation study how direct and indirect causal effects can be estimated when treatment probabilities need to be estimated using background information on individuals. Observational studies semiparametric regression unconfoundedness Causal inference Statistics Statistik
5	Model choice and variable selection in mixed & semiparametric models Säfken, Benjamin 27 March 2015 (has links) No description available. 510 semiparametric regression mixed model model selection Mathematics (PPN61756535X)
6	Statistical comparisons for nonlinear curves and surfaces Zhao, Shi 31 May 2018 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Estimation of nonlinear curves and surfaces has long been the focus of semiparametric and nonparametric regression. The advances in related model fitting methodology have greatly enhanced the analyst’s modeling flexibility and have led to scientific discoveries that would be otherwise missed by the traditional linear model analysis. What has been less forthcoming are the testing methods concerning nonlinear functions, particularly for comparisons of curves and surfaces. Few of the existing methods are carefully disseminated, and most of these methods are subject to important limitations. In the implementation, few off-the-shelf computational tools have been developed with syntax similar to the commonly used model fitting packages, and thus are less accessible to practical data analysts. In this dissertation, I reviewed and tested the existing methods for nonlinear function comparison, examined their operational characteristics. Some theoretical justifications were provided for the new testing procedures. Real data exampleswere included illustrating the use of the newly developed software. A new R package and a more user-friendly interface were created for enhanced accessibility. / 2020-08-22 Comparison of nonlinear functions Resampling method Software development
7	Zonal And Regional Load Forecasting In The New England Wholesale Electricity Market: A Semiparametric Regression Approach Farland, Jonathan 01 January 2013 (has links) (PDF) Power system planning, reliability analysis and economically efficient capacity scheduling all rely heavily on electricity demand forecasting models. In the context of a deregulated wholesale electricity market, using scheduling a region’s bulk electricity generation is inherently linked to future values of demand. Predictive models are used by municipalities and suppliers to bid into the day-ahead market and by utilities in order to arrange contractual interchanges among neighboring utilities. These numerical predictions are therefore pervasive in the energy industry. This research seeks to develop a regression-based forecasting model. Specifically, electricity demand is modeled as a function of calendar effects, lagged demand effects, weather effects, and a stochastic disturbance. Variables such as temperature, wind speed, cloud cover and humidity are known to be among the strongest predictors of electricity demand and as such are used as model inputs. It is well known, however, that the relationship between demand and weather can be highly nonlinear. Rather than assuming a linear functional form, the structural change in these relationships is explored. Those variables that indicate a nonlinear relationship with demand are accommodated with penalized splines in a semiparametric regression framework. The equivalence between penalized splines and the special case of a mixed model formulation allows for model estimation with currently available statistical packages such as R, STATA and SAS. Historical data are available for the entire New England region as well as for the smaller zones that collectively make up the regional grid. As such, a secondary research objective of this thesis is to explore whether or not an aggregation of zonal forecasts might perform better than those produced from a single regional model. Prior to this research, neither the applicability of a semiparametric regression-based approach towards load forecasting nor the potential improvement in forecasting performance resulting from zonal load forecasting has been investigated for the New England wholesale electricity market. Semiparametric Regression Load Forecasting Penalized Splines Mixed Models Econometrics
8	Semiparametric Techniques for Response Surface Methodology Pickle, Stephanie M. 14 September 2006 (has links) Many industrial statisticians employ the techniques of Response Surface Methodology (RSM) to study and optimize products and processes. A second-order Taylor series approximation is commonly utilized to model the data; however, parametric models are not always adequate. In these situations, any degree of model misspecification may result in serious bias of the estimated response. Nonparametric methods have been suggested as an alternative as they can capture structure in the data that a misspecified parametric model cannot. Yet nonparametric fits may be highly variable especially in small sample settings which are common in RSM. Therefore, semiparametric regression techniques are proposed for use in the RSM setting. These methods will be applied to an elementary RSM problem as well as the robust parameter design problem. / Ph. D. Genetic Algorithm Response Surface Methodology Semiparametric Regression Robust Parameter Design
9	Efficient inference in general semiparametric regression models Maity, Arnab 15 May 2009 (has links) Semiparametric regression has become very popular in the field of Statistics over the years. While on one hand more and more sophisticated models are being developed, on the other hand the resulting theory and estimation process has become more and more involved. The main problems that are addressed in this work are related to efficient inferential procedures in general semiparametric regression problems. We first discuss efficient estimation of population-level summaries in general semiparametric regression models. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model (e.g., population mean, probabilities, etc.). We place this problem in a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric efficient. Next, motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we consider developing score test in general semiparametric regression problems that involves Tukey style 1 d.f form of interaction between parametrically and non-parametrically modeled covariates. We develop adjusted score statistics which are unbiased and asymptotically efficient and can be performed using standard bandwidth selection methods. In addition, to over come the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented using standard computational methods. Finally, we take up the important problem of estimation in a general semiparametric regression model when covariates are measured with an additive measurement error structure having normally distributed measurement errors. In contrast to methods that require solving integral equation of dimension the size of the covariate measured with error, we propose methodology based on Monte Carlo corrected scores to estimate the model components and investigate the asymptotic behavior of the estimates. For each of the problems, we present simulation studies to observe the performance of the proposed inferential procedures. In addition, we apply our proposed methodology to analyze nontrivial real life data sets and present the results. Nonparametric/Semiparametric Regression Kernel Method Measurement Error Semiparametric Efficiency Repeated Measures
10	Modern Monte Carlo Methods and Their Application in Semiparametric Regression Thomas, Samuel Joseph 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The essence of Bayesian data analysis is to ascertain posterior distributions. Posteriors generally do not have closed-form expressions for direct computation in practical applications. Analysts, therefore, resort to Markov Chain Monte Carlo (MCMC) methods for the generation of sample observations that approximate the desired posterior distribution. Standard MCMC methods simulate sample values from the desired posterior distribution via random proposals. As a result, the mechanism used to generate the proposals inevitably determines the efficiency of the algorithm. One of the modern MCMC techniques designed to explore the high-dimensional space more efficiently is Hamiltonian Monte Carlo (HMC), based on the Hamiltonian differential equations. Inspired by classical mechanics, these equations incorporate a latent variable to generate MCMC proposals that are likely to be accepted. This dissertation discusses how such a powerful computational approach can be used for implementing statistical models. Along this line, I created a unified computational procedure for using HMC to fit various types of statistical models. The procedure that I proposed can be applied to a broad class of models, including linear models, generalized linear models, mixed-effects models, and various types of semiparametric regression models. To facilitate the fitting of a diverse set of models, I incorporated new parameterization and decomposition schemes to ensure the numerical performance of Bayesian model fitting without sacrificing the procedure’s general applicability. As a concrete application, I demonstrate how to use the proposed procedure to fit a multivariate generalized additive model (GAM), a nonstandard statistical model with a complex covariance structure and numerous parameters. Byproducts of the research include two software packages that all practical data analysts to use the proposed computational method to fit their own models. The research’s main methodological contribution is the unified computational approach that it presents for Bayesian model fitting that can be used for standard and nonstandard statistical models. Availability of such a procedure has greatly enhanced statistical modelers’ toolbox for implementing new and nonstandard statistical models. Bayesian Computation Generalized Additive Model Hamiltonian Monte Carlo Markov Chain Monte Carlo Semiparametric Regression

Search results