Global ETD Search

41	ESTIMATION IN PARTIALLY LINEAR MODELS WITH CORRELATED OBSERVATIONS AND CHANGE-POINT MODELS Fan, Liangdong 01 January 2018 (has links) Methods of estimating parametric and nonparametric components, as well as properties of the corresponding estimators, have been examined in partially linear models by Wahba [1987], Green et al. [1985], Engle et al. [1986], Speckman [1988], Hu et al. [2004], Charnigo et al. [2015] among others. These models are appealing due to their flexibility and wide range of practical applications including the electricity usage study by Engle et al. [1986], gum disease study by Speckman [1988], etc., wherea parametric component explains linear trends and a nonparametric part captures nonlinear relationships. The compound estimator (Charnigo et al. [2015]) has been used to estimate the nonparametric component of such a model with multiple covariates, in conjunction with linear mixed modeling for the parametric component. These authors showed, under a strict orthogonality condition, that parametric and nonparametric component estimators could achieve what appear to be (nearly) optimal rates, even in the presence of subject-specific random effects. We continue with research on partially linear models with subject-specific random intercepts. Inspired by Speckman [1988], we propose estimators of both parametric and nonparametric components of a partially linear model, where consistency is achievable under an orthogonality condition. We also examine a scenario without orthogonality to find that bias could still exist asymptotically. The random intercepts accommodate analysis of individuals on whom repeated measures are taken. We illustrate our estimators in a biomedical case study and assess their finite-sample performance in simulation studies. Jump points have often been found within the domain of nonparametric models (Muller [1992], Loader [1996] and Gijbels et al. [1999]), which may lead to a poor fit when falsely assuming the underlying mean response is continuous. We study a specific type of change-point where the underlying mean response is continuous on both left and right sides of the change-point. We identify the convergence rate of the estimator proposed in Liu [2017] and illustrate the result in simulation studies. Semiparametric models Change-point models Correlated observations Backfitting Kernel regression Voice rehabilitation Statistical Models
42	High Dimensional Multivariate Inference Under General Conditions Kong, Xiaoli 01 January 2018 (has links) In this dissertation, we investigate four distinct and interrelated problems for high-dimensional inference of mean vectors in multi-groups. The first problem concerned is the profile analysis of high dimensional repeated measures. We introduce new test statistics and derive its asymptotic distribution under normality for equal as well as unequal covariance cases. Our derivations of the asymptotic distributions mimic that of Central Limit Theorem with some important peculiarities addressed with sufficient rigor. We also derive consistent and unbiased estimators of the asymptotic variances for equal and unequal covariance cases respectively. The second problem considered is the accurate inference for high-dimensional repeated measures in factorial designs as well as any comparisons among the cell means. We derive asymptotic expansion for the null distributions and the quantiles of a suitable test statistic under normality. We also derive the estimator of parameters contained in the approximate distribution with second-order consistency. The most important contribution is high accuracy of the methods, in the sense that p-values are accurate up to the second order in sample size as well as in dimension. The third problem pertains to the high-dimensional inference under non-normality. We relax the commonly imposed dependence conditions which has become a standard assumption in high dimensional inference. With the relaxed conditions, the scope of applicability of the results broadens. The fourth problem investigated pertains to a fully nonparametric rank-based comparison of high-dimensional populations. To develop the theory in this context, we prove a novel result for studying the asymptotic behavior of quadratic forms in ranks. The simulation studies provide evidence that our methods perform reasonably well in the high-dimensional situation. Real data from Electroencephalograph (EEG) study of alcoholic and control subjects is analyzed to illustrate the application of the results. Profile analysis MANOVA High-dimension Repeated measure Non-parametric Rank transforms Multivariate Analysis Statistical Methodology
43	Mixtures-of-Regressions with Measurement Error Fang, Xiaoqiong 01 January 2018 (has links) Finite Mixture model has been studied for a long time, however, traditional methods assume that the variables are measured without error. Mixtures-of-regression model with measurement error imposes challenges to the statisticians, since both the mixture structure and the existence of measurement error can lead to inconsistent estimate for the regression coefficients. In order to solve the inconsistency, We propose series of methods to estimate the mixture likelihood of the mixtures-of-regressions model when there is measurement error, both in the responses and predictors. Different estimators of the parameters are derived and compared with respect to their relative efficiencies. The simulation results show that the proposed estimation methods work well and improve the estimating process. mixtures-of-regression measurement error EM algorithm Poisson regression Applied Statistics Statistical Models
44	MULTIFACTOR DIMENSIONALITY REDUCTION WITH P RISK SCORES PER PERSON Li, Ye 01 January 2018 (has links) After reviewing Multifactor Dimensionality Reduction(MDR) and its extensions, an approach to obtain P(larger than 1) risk scores is proposed to predict the continuous outcome for each subject. We study the mean square error(MSE) of dimensionality reduced models fitted with sets of 2 risk scores and investigate the MSE for several special cases of the covariance matrix. A methodology is proposed to select a best set of P risk scores when P is specified a priori. Simulation studies based on true models of different dimensions(larger than 3) demonstrate that the selected set of P(larger than 1) risk scores outperforms the single aggregated risk score generated in AQMDR and illustrate that our methodology can determine a best set of P risk scores effectively. With different assumptions on the dimension of the true model, we considered the preferable set of risk scores from the best set of two risk scores and the best set of three risk scores. Further, we present a methodology to access a set of P risk scores when P is not given a priori. The expressions of asymptotic estimated mean square error of prediction(MSPE) are derived for a 1-dimensional model and 2-dimensional model. In the last main chapter, we apply the methodology of selecting a best set of risk scores where P has been specified a priori to Alzheimer’s Disease data and achieve a set of 2 risk scores and a set of three risk scores for each subject to predict measurements on biomarkers that are crucially involved in Alzheimer’s Disease. Multifactor Dimensionality Reduction Risk Score Continuous outcome Gene-gene Interaction Statistics and Probability
45	THE FAMILY OF CONDITIONAL PENALIZED METHODS WITH THEIR APPLICATION IN SUFFICIENT VARIABLE SELECTION Xie, Jin 01 January 2018 (has links) When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable Screening (CS-SCAL-VS) algorithms based on CAL. The asymptotic and oracle properties are proved. Simulations, especially for the large p small n problems, are performed with comparisons with other existing methods. We further extend to the linear model setup to the generalized linear models (GLM). Instead of least squares, we consider the likelihood function with L1 penalty, that is the penalized likelihood methods. We proposed for Generalized Conditional Adaptive Lasso (GCAL) for the generalized linear models. We then further extend the method for any penalty terms that satisfy certain regularity conditions, namely Conditionally Penalized Estimate (CPE). Asymptotic and oracle properties are showed. Four corresponding sufficient variable screening algorithms are proposed. Simulation examples are evaluated for our method with comparisons with existing methods. GCAL is also evaluated with a read data set on leukemia. Generalized Conditional Adaptive Lasso High-dimensional Data Variable Screening Variable Selection Applied Statistics Statistical Methodology Statistical Models Statistical Theory
46	A Flexible Zero-Inflated Poisson Regression Model Roemmele, Eric S. 01 January 2019 (has links) A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an "EM-like" algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer's disease. Bootstrap Count data EM Algorithm zero-inflation semiparametric model Statistical Models Statistical Theory
47	UNSUPERVISED LEARNING IN PHYLOGENOMIC ANALYSIS OVER THE SPACE OF PHYLOGENETIC TREES Kang, Qiwen 01 January 2019 (has links) A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing information) makes it unique in many parametric methods. At the same time, the exploration of statistical analysis in high-dimensional space of phylogenetic trees has never stopped, many tree metrics are proposed to statistical methodology. Tropical metric is one of them. We implement a MCMC sampling method to estimate the principal components in a tree space with the tropical metric for achieving dimension reduction and visualizing the result in a 2-D tropical triangle. Evolutionary models Gene trees Phylogenomics MCMC Tropical geometry Biostatistics Statistical Methodology
48	Serial Testing for Detection of Multilocus Genetic Interactions Al-Khaledi, Zaid T. 01 January 2019 (has links) A method to detect relationships between disease susceptibility and multilocus genetic interactions is the Multifactor-Dimensionality Reduction (MDR) technique pioneered by Ritchie et al. (2001). Since its introduction, many extensions have been pursued to deal with non-binary outcomes and/or account for multiple interactions simultaneously. Studying the effects of multilocus genetic interactions on continuous traits (blood pressure, weight, etc.) is one case that MDR does not handle. Culverhouse et al. (2004) and Gui et al. (2013) proposed two different methods to analyze such a case. In their research, Gui et al. (2013) introduced the Quantitative Multifactor-Dimensionality Reduction (QMDR) that uses the overall average of response variable to classify individuals into risk groups. The classification mechanism may not be efficient under some circumstances, especially when the overall mean is close to some multilocus means. To address such difficulties, we propose a new algorithm, the Ordered Combinatorial Quantitative Multifactor-Dimensionality Reduction (OQMDR), that uses a series of testings, based on ascending order of multilocus means, to identify best interactions of different orders with risk patterns that minimize the prediction error. Ten-fold cross-validation is used to choose from among the resulting models. Regular permutations testings are used to assess the significance of the selected model. The assessment procedure is also modified by utilizing the Generalized Extreme-Value distribution to enhance the efficiency of the evaluation process. We presented results from a simulation study to illustrate the performance of the algorithm. The proposed algorithm is also applied to a genetic data set associated with Alzheimer's Disease. Multifactor dimensionality reduction Cross Validation Model selection Continuous Trait Continuous Phenotype Ordered Combinatorial Partitioning Applied Statistics Biostatistics Statistics and Probability
49	TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA Weng, Jiaying 01 January 2019 (has links) The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry. In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum discrepancy approaches to achieve optimal solutions and also develop a novel and efficient optimization algorithm to obtain the sparse estimates. We derive asymptotic properties of the proposed estimators and demonstrate its efficiency gains compared to the traditional estimators. The oracle properties of the sparse estimates are derived. Simulation studies and real data examples are used to illustrate the effectiveness of the proposed methods. Wavelet transform is another tool that effectively detects information from time-localization of high frequency. Parallel to our proposed Fourier transform methods, we also develop a wavelet transform version approach and derive the asymptotic properties of the resulting estimators. Central subspace Fourier transform Predictors hypothesis tests Sufficient dimension reduction Sufficient variable selection Wavelet transform Multivariate Analysis Statistical Methodology Statistical Models
50	A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS Ke, Chenlu 01 January 2019 (has links) This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between each predictor and the response as well as joint information among predictors. As a result, our method is more capable of selecting all truly active variables than marginal selection methods. Furthermore, our procedure can handle both continuous and discrete responses with mixed-type predictors. We establish the sure screening property of the proposed approach under mild conditions. Third, we focus on a model-free sufficient dimension reduction approach using the new measure. Our method does not require strong assumptions on predictors and responses. An algorithm is developed to find dimension reduction directions using sequential quadratic programming. We illustrate the advantages of our new measure and its two applications in high dimensional data analysis by numerical studies across a variety of settings. High dimensional data analysis Independence Reproducing Kernel Hilbert Space Sufficient Dimension Reduction Sufficient Variable Selection Categorical Data Analysis Multivariate Analysis Statistics and Probability

Search results