Spelling suggestions: "subject:"biostatistics"" "subject:"geostatistics""
A Modified Random Forest Kernel for Highly Nonstationary Gaussian Process Regression with Application to Clinical DataVanHouten, Jacob Paul 25 July 2016 (has links)
Nonstationary Gaussian process regression can be used to transform irregularly episodic and noisy measurements into continuous probability densities to make them more compatible with standard machine learning algorithms. However, current inference algorithms are time-consuming or have difficulty with the highly bursty, extremely nonstationary data that are common in the medical domain. One efficient and flexible solution uses a partition kernel based on random forests, but its current embodiment produces undesirable pathologies rooted in the piecewise-constant nature of its inferred posteriors. I present a modified random forest kernel that adds a new sources of randomness to the trees, which overcomes existing pathologies and produces good results for highly bursty, extremely nonstationary clinical laboratory measurements.
Rank-Based Semiparametric Methods: Covariate-Adjusted Spearman's Correlation with Probability-Scale Residuals and Cumulative Probability ModelsLiu, Qi 27 July 2016 (has links)
In this dissertation, we develop semiparametric rank-based methods. These types of methods are particularly useful with skewed data, nonlinear relationships, and truncated measurements. Semiparametric rank-based methods can achieve a good balance between robustness and efficiency. The first part of this dissertation develops new estimators for covariate-adjusted Spearman's rank correlation, both partial and conditional, using probability-scale residuals (PSRs). These estimators are consistent for natural extensions of the population parameter of Spearman's rank correlation in the presence of covariates and are general for both continuous and discrete variables. We evaluate their performance with simulations and illustrate their application in two examples. To preserve the rank-based nature of Spearman's correlation, we obtain PSRs from ordinal cumulative probability models for both discrete and continuous variables. Cumulative probability models were first invented to handle discrete ordinal outcomes, and their potential utility for the analysis of continuous outcomes has been largely unrecognized. This motivates the second part of this dissertation: an in-depth study of the application of cumulative probability models to continuous outcomes. When applied to continuous outcomes, these models can be viewed as semiparametric transformation models. We present a latent variable motivation for these models; describe estimation, inference, assumptions, and model diagnostics; conduct extensive simulations to investigate the finite sample performance of these models with and without proper link function specification; and illustrate their application in an HIV study. Finally, we developed an R package, PResiduals, to compute PSRs, to incorporate them into conditional tests of association, and to implement our covariate-adjusted Spearman's rank correlation. The third part of this dissertation contains a vignette for this package, in which we illustrate its usage with a publicly available dataset.
Comparative Causal Effect Estimation and Robust Variance for Longitudinal Data Structures with Applications to Observational HIV TreatmentTran, Linh Mai 02 September 2016 (has links)
<p> This dissertation discusses the application and comparative performance of double robust estimators for estimating the intervention specific mean outcome in longitudinal settings with time-dependent confounding as well as the corresponding estimator variances. (Abstract shortened by ProQuest.) </p>
Empirical Bayes Methods for Everyday Statistical ProblemsSmith, Derek Kyle 27 January 2017 (has links)
This work develops an empirical Bayes approach to statistical difficulties that arise in real-world applications. Empirical Bayes methods use Bayesian machinery to obtain statistical estimates, but rather than having a prior distribution for model parameters that is assumed, the prior is estimated from the observed data. Misuse of these methods as though the resulting âposterior distributionsâ were true Bayes posteriors has lead to limited adoption, but careful application can result in improved point estimation in a wide variety of circumstances. The first problem solved via an empirical Bayes approach deals with surrogate outcome measures. Theory for using surrogate outcomes for inference in clinical trials has been developed over the last 30 years starting with the development of the Prentice criteria for surrogate outcomes in 1989. Theory for using surrogates outside of the clinical trials arena or to develop risk score models is lacking. In this work we propose criteria similar to the Prentice criteria for using surrogates to develop risk scores. We then identify a particular type of surrogate which violates the proposed criteria in a particular way, which we deem a partial surrogate. The behavior of partial surrogates is investigated through a series of simulation studies and an empirical Bayes weighting scheme is developed which alleviates their pathologic behavior. It is then hypothesized that a common clinical measure, change in perioperative serum creatinine level from baseline, is actually a partial surrogate. It is demonstrated that it displays the same sort of pathologic behaviors seen in the simulation study and that they are similarly rectified using the proposed method. The result is a more acurate predictive model for both short and long-term measure of kidney function. The second problem solved deals with likelihood support intervals. Likelihood intervals are a way to quantify statistical uncertainty. Unlike other, more common methods for interval estimation, every value that is included in a support interval must be supported by the data at a specified level. Support intervals have not seen wide usage in practice due to a philosophic belief amongst many in the field that frequency-based or probabilistic inference is somehow stronger than inference based soley on the likelihood. In this work we develop a novel procedure based on the bootstrap for estimating the frequency characteristics of likelihood intervals. The resulting intervals have both the frequency properties of the set prized by frequentists as well as each individual member of the set attaining a specified support level. An R package, supportInt, was developed to calculate these intervals and published on the Comprehensive R Archive Network. The third problem addressed deals with the design of clinical trials when the potential protocols for the intervention are highly variable. A meta-analysis is presented in which the difficulties this situation presents becomes apparent. The results of this analysis of randomized trials of perioperative beta-blockade as a potential intervention to prevent my- ocardial infarction in the surgical setting are completely dependent on the statistical model chosen. In particular, which elements of the trial protocol are pooled and which are al- lowed by the model to impact the estimate of treatment efficacy completely determine the inference drawn from the data. This problem occurs largely because the trials conducted on the intervention of interest are not richly variable in some aspects of protocol. In this section it is demonstrated that large single protocol designs that are frequently advocated for can be replaced by multi-arm protocols to more accurately assess the question of an interventionâs potential efficacy. Simulation studies are conducted that make use of a novel adaptive randomization scheme based on an empirically estimated likelihood function. A tool is made available in a Shiny app that allows for the conduct of further studies by the reader under a wide variety of conditions.
Aspects of Causal Inference within the Evenly Matchable Population: The Average Treatment Effect on the Evenly Matchable Units, Visually Guided Cohort Selection, and Bagged One-to-One MatchingSamuels, Lauren Ruth 13 December 2016 (has links)
This dissertation consists of three papers related to causal inference about the evenly matchable units in observational studies of treatment effect. The first paper begins by defining the evenly matchable units in a sample or population in which the effect of a binary treatment is of interest: a unit is evenly matchable if the localized region of the (possibly transformed) covariate space centered on that unit contains at least as many units from the opposite group as from its own group. The paper then defines the average treatment effect on the evenly matchable units (ATM) and continues with a discussion of currently available matching methods that can be used to estimate the ATM, followed by the introduction of three new weighting-based approaches to ATM estimation and a case study illustrating some of these techniques. The second paper introduces a freely available web application that allows analysts to combine information from covariate distributions and estimated propensity scores to create transparent, covariate-based study inclusion criteria as a first step in estimation of the ATM or other quantities. The app, Visual Pruner, is freely available at http://statcomp2.vanderbilt.edu:37212/VisualPruner and is easily incorporated into a reproducible-research workflow. The third paper introduces a new technique for estimation of the ATM or other estimands: bagged one-to-one matching (BOOM), which combines the bias-reducing properties of one-to-one matching with the variance-reducing properties of bootstrap aggregating, or bagging. In this paper I describe the BOOM algorithm in detail and investigate its performance in a simulation study and a case study. In the simulation study, the BOOM estimator achieves as much bias reduction as the estimator based on one-to-one matching, while having much lower variance. In the case study, BOOM yields estimates similar to those from one-to-one matching, with narrower 95% confidence intervals.
Controlling for Confounding when Association is Quantified by Area Under the ROC CurveGaladima, Hadiza I 01 January 2015 (has links)
In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not done randomly in observational studies, comparisons of outcomes between exposed and non-exposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of odds ratio and hazard ratio. However, there is a lack of research into the performance of propensity score methods for estimating the area under the ROC curve (AUC). In this dissertation, we propose AUC as measure of effect when outcomes are continuous. The AUC is interpreted as the probability that a randomly selected non-exposed subject has a better response than a randomly selected exposed subject. The aim of this research is to examine methods to control for confounding when association between exposure and outcomes is quantified by AUC. We look at the performance of the propensity score, including determining the optimal choice of variables for the propensity score model. Choices include covariates related to exposure group, covariates related to outcome, covariates related to both exposure and outcome, and all measured covariates. Additionally, we compare the propensity score approach to that of the conventional regression approach to adjust for AUC. We conduct a series of simulations to assess the performance of the methodology where the choice of the best estimator depends on bias, relative bias, mean squared error, and coverage of 95% confidence intervals. Furthermore, we examine the impact of model misspecification in conventional regression adjustment for AUC by incorrectly modelling the covariates in the data. These modelling errors include omitting covariates, dichotomizing continuous covariates, modelling quadratic covariates as linear, and excluding interactions terms from the model. Finally, a dataset from the shock research unit at the University of Southern California is used to illustrate the estimation of the adjusted AUC using the proposed approaches.
Power Analysis for the Mixed Linear ModelDixon, Cheryl Annette 01 January 1996 (has links)
Power analysis is becoming standard in inference based research proposals and is used to support the proposed design and sample size. The choice of an appropriate power analysis depends on the choice of the research question, measurement procedures, design, and analysis plan. The "best" power analysis, however, will have many features of a sound data analysis. First, it addresses the study hypothesis, and second, it yields a credible answer. Power calculations for standard statistical hypotheses based on normal theory have been defined for t-tests through the univariate and multivariate general linear models. For these statistical methods, the approaches to power calculations have been presented based on the exact or approximate distributions of the test statistics in question. Through the methods proposed by O'Brien and Muller (1993), the noncentrality parameter for the noncentral distribution of the test statistics for the univariate and multivariate general linear models is expressed in terms of its distinct components. This in tum leads to methods for calculating power which are efficient and easy to implement. As more complex research questions are studied, more involved methods have been proposed to analyze data. One such method includes the mixed linear model. This research extends the approach to power calculation used for the general linear model to the mixed linear model. Power calculations for the mixed linear model will be based on the approximate F statistic for testing the mixed model's fixed effects proposed by Helms (1992). The noncentrality parameter of the approximate noncentral F for the mixed model will be written in terms of its distinct components so that a useful and efficient method for calculating power in the mixed model setting will be achieved. In this research, it has been found that the rewriting of the noncentrality parameter varies depending on study design. Thus, the noncentrality parameter for three specific cases of study design are derived.
Statistical analysis methods for confounded data and clustered time-to-event dataZhao, Qiang 12 July 2019 (has links)
Confounding effects are a commonly encountered challenge in statistical inference. Ignoring confounders can cause bias in estimation. In practice, confounders are often unknown, which makes applying classical methods to deal with the confounding effect difficult. In the first thesis project, we apply the Gaussian Mixture Model (GMM) to help overcome the difficulty caused by a shortage of information about confounders. A new estimator is developed which shows better performance than the unadjusted estimator with regard to bias and confidence interval coverage probability. In the second thesis project, we consider the bias caused by an informative number of events in a recurrent-event data framework. Wang and Chang (1999) studied this bias and introduced an unbiased Kaplan-Meier-like estimator for recurrent event data. However, their method lacks corresponding rank tests to compare survival estimates among different groups. In this thesis project, we extend three commonly used rank tests to compare within group estimates based on Wang and Chang’s unbiased survival estimator. We also compare the power of our new method and the clustered rank test method which did not consider the informativeness of number of events. In addition, we show how to estimate the hazards ratio based on the log-rank test statistics. The unbiasedness of the log hazards ratio estimator calculated based on the extended log-rank test statistic is confirmed via simulation. In the third thesis project, we extend Firth’s correction method for the maximum partial likelihood estimator (MPLE) to clustered survival data. Heinze and Schemper (2001) showed that Firth’s correction method is applicable to the Cox regression estimates for survival data with small numbers of events or even with the monotone likelihood problem. However, this problem has not been solved in the clustered survival data setting. In this dissertation project, we extend Firth’s correction method by adopting a robust variance estimator to calculate the correct variability and reduce bias for the MPLE estimator in clustered survival data analysis. / 2021-07-12T00:00:00Z
Evaluation of marker density for population stratification adjustment and of a family-informed phenotype imputation method for single variant and variant-set tests of associationChen, Yuning 07 November 2018 (has links)
Whole exome sequencing (WES) data cover only 1% of the genome and is designed to capture variants in coding regions of genes. When associating genetic variations with an outcome, there are multiple issues that could affect the association test results. This dissertation will explore two of these issues: population stratification and missing data. Population stratification may cause spurious association in analysis using WES data, an issue also encountered in genome-wide association studies (GWAS) using genotyping array data. Population stratification adjustments have been well studied with array-based genotypes but need to be evaluated in the context of WES genotypes where a smaller portion of the genome is covered. Secondly, sample size is a major component of statistical power, which can be reduced by missingness in phenotypic data. While some phenotypes are hard to collect due to cost and loss to follow-up, correlated phenotypes that are easily collected and are complete can be leveraged in tests of association. First, we compare the performance of GWAS and WES markers for population stratification adjustments in tests of association. We evaluate two established approaches: principal components (PCs) and mixed effects models. Our results illustrate that WES markers are sufficient to correct for population stratification. Next, we develop a family-informed phenotype imputation method that incorporates information contained in family structure and correlated phenotypes. Our method has higher imputation accuracy than methods that do not use family members and can help improve power while achieving the correct type-I error rate. Finally, we extend the family-informed phenotype imputation method to variant-set tests. Single variant tests do not have enough power to identify rare variants with small effect sizes. Variant-set association tests have been proven to be a powerful alternative approach to detect associations with rare variants. We derive a theoretical statistical power approximation for both burden tests and Sequence Kernel Association Test (SKAT) and investigate situations where our imputation approach can improve power in association tests. / 2020-11-07T00:00:00Z
Evaluation and extension of a kernel-based method for gene-gene interaction tests of common variantsXue, Luting 09 November 2016 (has links)
Interaction is likely to play a signiﬁcant role in complex diseases, and various methods are available for identifying interactions between variants in genome-wide association studies (GWAS). Kernel-based variance component methods such as SKAT are ﬂexible and computationally eﬃcient methods for identifying marginal associations. A kernel-based variance component method, called the Gene-centric Gene-Gene Interaction with Smoothing-sPline ANOVA model (SPA3G) was proposed to identify gene-gene interactions for a quantitative trait. For interaction testing, the SPA3G method performs better than some SNP-based approaches under many scenarios. In this thesis, we evaluate the properties of the SPA3G method and extend SPA3G using alternative p-value approximations and interaction kernels. This thesis focuses on common variants only. Our simulation results show that the allele matching interaction kernel, combined with the method of moments p-value approximation, leads to inﬂated type I error in small samples. For small samples, we propose a Principal Component (PC)-based interaction kernel and computing p-values with a 3-moment adjustment that yield more appropriate type I error. We also propose a weighted PC kernel that has higher power than competing approaches when interaction eﬀects are sparse. By combining the two proposed kernels, we develop omnibus methods that obtain near-optimal power in most settings. Finally, we illustrate how to analyze the interaction between selected gene pairs on the age at natural menopause (ANM) from the Framingham Heart Study.
Page generated in 0.1416 seconds