Spelling suggestions: "subject:"correlation structures"" "subject:"borrelation structures""
1 |
Empirical likelihood and mean-variance models for longitudinal dataLi, Daoji January 2011 (has links)
Improving the estimation efficiency has always been one of the important aspects in statistical modelling. Our goal is to develop new statistical methodologies yielding more efficient estimators in the analysis of longitudinal data. In this thesis, we consider two different approaches, empirical likelihood and jointly modelling the mean and variance, to improve the estimation efficiency. In part I of this thesis, empirical likelihood-based inference for longitudinal data within the framework of generalized linear model is investigated. The proposed procedure takes into account the within-subject correlation without involving direct estimation of nuisance parameters in the correlation matrix and retains optimality even if the working correlation structure is misspecified. The proposed approach yields more efficient estimators than conventional generalized estimating equations and achieves the same asymptotic variance as quadratic inference functions based methods. The second part of this thesis focus on the joint mean-variance models. We proposed a data-driven approach to modelling the mean and variance simultaneously, yielding more efficient estimates of the mean regression parameters than the conventional generalized estimating equations approach even if the within-subject correlation structure is misspecified in our joint mean-variance models. The joint mean-variances in parametric form as well as semi-parametric form has been investigated. Extensive simulation studies are conducted to assess the performance of our proposed approaches. Three longitudinal data sets, Ohio Children’s wheeze status data (Ware et al., 1984), Cattle data (Kenward, 1987) and CD4+ data (Kaslowet al., 1987), are used to demonstrate our models and approaches.
|
2 |
Statistical Learning of Proteomics Data and Global Testing for Data with CorrelationsDonglai Chen (6405944) 15 May 2019 (has links)
<div>This dissertation consists of two parts. The first part is a collaborative project with Dr. Szymanski's group in Agronomy at Purdue, to predict protein complex assemblies and interactions. Proteins in the leaf cytosol of Arabidopsis were fractionated using Size Exclusion Chromatography (SEC) and mixed-bed Ion Exchange Chromatography (IEX).</div><div>Protein mass spectrometry data were obtained for the two platforms of separation and two replicates of each. We combine the four data sets and conduct a series of statistical learning, including 1) data filtering, 2) a two-round hierarchical clustering to integrate multiple data types, 3) validation of clustering based on known protein complexes,</div><div>4) mining dendrogram trees for prediction of protein complexes. Our method is developed for integrative analysis of different data types and it eliminates the difficulty of choosing an appropriate cluster number in clustering analysis. It provides a statistical learning tool to globally analyze the oligomerization state of a system of protein complexes.</div><div><br></div><div><br></div><div>The second part examines global hypothesis testing under sparse alternatives and arbitrarily strong dependence. Global tests are used to aggregate information and reduce the burden of multiple testing. A common situation in modern data analysis is that variables with nonzero effects are sparse. The minimum p-value and higher criticism tests are particularly effective and more powerful than the F test under sparse alternatives. This is the common setting in genome-wide association study (GWAS) data. However, arbitrarily strong dependence among variables poses a great challenge towards the p-value calculation of these optimal tests. We develop a latent variable adjusted method to correct minimum p-value test. After adjustment, test statistics become weakly dependent and the corresponding null distributions are valid. We show that if the latent variable is not related to the response variable, power can be improved. Simulation studies show that our method is more powerful than other methods in highly sparse signal and correlated marginal tests setting. We also show its application in a real dataset.</div>
|
Page generated in 0.0936 seconds