Global ETD Search

1	Robust Statistical Approaches Dealing with High-Dimensional Observational Data Zhu, Huichen January 2019 (has links) The theme of this dissertation is to develop robust statistical approaches for the high-dimensional observational data. The development of technology makes data sets more accessible than any other time in history. Abundant data leads to numerous appealing findings and at the same time, requires more thoughtful efforts. We are encountered many obstacles when dealing with high-dimensional data. Heterogeneity and complex interaction structure rule out the traditional mean regression method and expect a novel approach to circumvent the complexity and obtain significant conclusions. Missing data mechanism in high-dimensional data is complicated and is hard to manage with existing methods. This dissertation contains three parts to tackle these obstacles: (1) a tree-based method integrated with the domain knowledge to improve prediction accuracy; (2) a tree-based method with linear splits to accommodate the large-scale and highly correlated data set; (3) an integrative analysis method to reduce the dimension and impute the block-wise missing data simultaneously. In the first part of the dissertation, we propose a tree-based method called conditional quantile random forest (CQRF) to improve the screening and intervention of the onset of mentor disorder incorporating with rich and comprehensive electronic medical records (EMR). Our research is motivated by the REactions to Acute Care and Hospitalization (REACH) study, which is an ongoing prospective observational cohort study of the patient with symptoms of a suspected acute coronary syndrome (ACS). We aim to develop a robust and effective statistical prediction method. The proposed approach fully takes the population heterogeneity into account. We partition the sample space guided by quantile regression over the entire quantile process. The proposed CQRF can provide a more comprehensive and accurate prediction. We also provide theoretical justification for the estimate quantile process. In the second part of the dissertation, we apply the proposed CQRF to REACH data set. The predictive analysis derived by the proposed approach shows that for both entire samples and high-risk group, the proposed CQRF provides more accurate predictions compared with other existing and widely used methods. The variable importance scores give a promising result based on the proposed CQRF that the proposed importance scores identify two variables which have been proved to be critical features by the qualitative study. We also apply the proposed CQRF to Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study data set. We show that the proposed approach improves the personalized medicine recommendation compared with existing treatment recommendation method. We also conduct two simulation studies based on the two real data sets. Both simulation studies validate the consistent property of the estimated quantile process. In the second part, we also extend the proposed CQRF with univariate splits to linear splits to accommodate a large number of highly correlated variables. Gene-environment interaction is a widely concerned topic since the traits of complex disease is always difficult to understand, and we are eager to find interventions tailored to individual genetic variations. The proposed approach is applied to a Breast Cancer Family Registry (BCFR) study data set with body mass index (BMI) as the response variable, several nutrition intake factors, and genotype variables. We aim to figure out what kind of genetic variations affect the heterogeneous effect of the environmental factors on BMI. We devise a criterion which measures the relationship between the response variable and gene variants conditioning on the environmental factor to determine the optimal linear combination split. The variable importance score is also calculated by summing up the criterion across all splits in the random forest. We show in the results that top-ranked genes prioritized by the proposed importance scores make the effect of the environmental factors on BMI differently. In the third part, we introduce an integrative analysis approach called generalized integrative principal component analysis (GIPCA). The heterogeneous data types and the presence of block-wise missing data pose significant challenges to the integration of multi-source data and further statistical analyses. There is not literature can easily accommodate data of multiple types with block-wise missing structure. The proposed GIPCA is a low-rank method which conducts the dimension reduction and imputation of block-wise missing data simultaneously to data with multiple types. Both simulation study and real data analysis show that the proposed approach achieves good missing data imputation accuracy and identifies some meaningful signals. Biometry Biometry--Data processing Statistics Decision trees
2	Survival Analysis using Bivariate Archimedean Copulas Chandra, Krishnendu January 2015 (has links) In this dissertation we solve the nonidentifiability problem of Archimedean copula models based on dependent censored data (see [Wang, 2012]). We give a set of identifiability conditions for a special class of bivariate frailty models. Our simulation results show that our proposed model is identifiable under our proposed conditions. We use EM algorithm to estimate unknown parameters and the proposed estimation approach can be applied to fit dependent censored data when the dependence is of research interest. The marginal survival functions can be estimated using the copula-graphic estimator (see [Zheng and Klein, 1995] and [Rivest and Wells, 2001]) or the estimator proposed by [Wang, 2014]. We also propose two model selection procedures for Archimedean copula models, one for uncensored data and the other one for right censored bivariate data. Our simulation results are similar to that of [Wang and Wells, 2000] and suggest that both procedures work quite well. The idea of our proposed model selection procedure originates from the model selection procedure for Archimedean copula models proposed by [Wang and Wells, 2000] for right censored bivariate data using the L2 norm corresponding to the Kendall distribution function. A suitable bootstrap procedure is yet to be suggested for our method. We further propose a new parameter estimator and a simple goodness-of-fit test for Archimedean copula models when the bivariate data is under fixed left truncation. Our simulation results suggest that our procedure needs to be improved so that it can be more powerful, reliable and efficient. In our strategy, to obtain estimates for the unknown parameters, we heavily exploit the concept of truncated tau (a measure of association established by [Manatunga and Oakes, 1996] for left truncated data). The idea of our goodness of fit test originates from the goodness-of-fit test for Archimedean copula models proposed by [Wang, 2010] for right censored bivariate data. Censored observations (Statistics) Copulas (Mathematical statistics) Biometry
3	Marginal Screening on Survival Data Huang, Tzu Jung January 2017 (has links) This work develops a marginal screening test to detect the presence of significant predictors for a right-censored time-to-event outcome under a high-dimensional accelerated failure time (AFT) model. Establishing a rigorous screening test in this setting is challenging, not only because of the right censoring, but also due to the post-selection inference. The oracle property in such situations fails to ensure adequate control of the family-wise error rate, and this raises questions about the applicability of standard inferential methods. McKeague and Qian (2015) constructed an adaptive resampling test to circumvent this problem under ordinary linear regression. To accommodate right censoring, we develop a test statistic based on a maximally selected Koul--Susarla--Van Ryzin estimator from a marginal AFT model. A regularized bootstrap method is used to calibrate the test. Our test is more powerful and less conservative than the Bonferroni correction and other competing methods. This proposed method is evaluated in simulation studies and applied to two real data sets. Biometry Statistics Failure time data analysis
4	Detection of multiple change-points in hazard models Unknown Date (has links) Change-point detection in hazard rate function is an important research topic in survival analysis. In this dissertation, we firstly review existing methods for single change-point detection in piecewise exponential hazard model. Then we consider the problem of estimating the change point in the presence of right censoring and long-term survivors while using Kaplan-Meier estimator for the susceptible proportion. The maximum likelihood estimators are shown to be consistent. Taking one step further, we propose an counting process based and least squares based change-point detection algorithm. For single change-point case, consistency results are obtained. We then consider the detection of multiple change-points in the presence of long-term survivors via maximum likelihood based and counting process based method. Last but not least, we use a weighted least squares based and counting process based method for detection of multiple change-points with long-term survivors and covariates. For multiple change-points detection, simulation studies show good performances of our estimators under various parameters settings for both methods. All methods are applied to real data analyses. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2014. / FAU Electronic Theses and Dissertations Collection Problem solving--Data processing. Process control--Statistical methods. Point processes. Mathematical statistics.
5	Variable selection and structural discovery in joint models of longitudinal and survival data He, Zangdong January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Joint models of longitudinal and survival outcomes have been used with increasing frequency in clinical investigations. Correct specification of fixed and random effects, as well as their functional forms is essential for practical data analysis. However, no existing methods have been developed to meet this need in a joint model setting. In this dissertation, I describe a penalized likelihood-based method with adaptive least absolute shrinkage and selection operator (ALASSO) penalty functions for model selection. By reparameterizing variance components through a Cholesky decomposition, I introduce a penalty function of group shrinkage; the penalized likelihood is approximated by Gaussian quadrature and optimized by an EM algorithm. The functional forms of the independent effects are determined through a procedure for structural discovery. Specifically, I first construct the model by penalized cubic B-spline and then decompose the B-spline to linear and nonlinear elements by spectral decomposition. The decomposition represents the model in a mixed-effects model format, and I then use the mixed-effects variable selection method to perform structural discovery. Simulation studies show excellent performance. A clinical application is described to illustrate the use of the proposed methods, and the analytical results demonstrate the usefulness of the methods. Joint models Mixed effect selection Structural discovery Adaptive LASSO Gaussian quadrature EM algorithm Regression analysis -- Data processing Numerical analysis -- Data processing Spectral theory (Mathematics) Estimation theory -- Analysis Statistics -- Data processing Calculus of variations Structural bioinformatics Parameter estimation
6	Joint models for longitudinal and survival data Yang, Lili 11 July 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Epidemiologic and clinical studies routinely collect longitudinal measures of multiple outcomes. These longitudinal outcomes can be used to establish the temporal order of relevant biological processes and their association with the onset of clinical symptoms. In the first part of this thesis, we proposed to use bivariate change point models for two longitudinal outcomes with a focus on estimating the correlation between the two change points. We adopted a Bayesian approach for parameter estimation and inference. In the second part, we considered the situation when time-to-event outcome is also collected along with multiple longitudinal biomarkers measured until the occurrence of the event or censoring. Joint models for longitudinal and time-to-event data can be used to estimate the association between the characteristics of the longitudinal measures over time and survival time. We developed a maximum-likelihood method to joint model multiple longitudinal biomarkers and a time-to-event outcome. In addition, we focused on predicting conditional survival probabilities and evaluating the predictive accuracy of multiple longitudinal biomarkers in the joint modeling framework. We assessed the performance of the proposed methods in simulation studies and applied the new methods to data sets from two cohort studies. / National Institutes of Health (NIH) Grants R01 AG019181, R24 MH080827, P30 AG10133, R01 AG09956. joint models longitudinal data survival data bivariate change point models prediction Bayesian method EM algorithm Biologically-inspired computing Probability measures Expectation-maximization algorithms Failure time data analysis Numerical analysis -- Data processing Clinical trials -- Statistical methods

1

Page generated in 0.0732 seconds