Global ETD Search

181	Evaluating multiple imputation methods for longitudinal healthy aging index - a score variable with data missing due to death, dropout and several missing data mechanisms Kane, Elizabeth 27 January 2018 (has links) The healthy aging index (HAI) is a score variable based on five clinical components. I assess how well it predicts mortality in a sample of older adults from the Framingham Heart Study (FHS). Over 30% of FHS participants have missing HAI across time; I investigate how well imputation methods perform in this setting. I run simulations to compare four methods of multiple imputation (MI) by fully conditional specification (FCS) and the complete case (CC) approach on estimation of means, correlations, and slopes of the HAI over time. I simulate multivariate normal data for each component of HAI at four time points, along with age and sex, using within and across-time correlation patterns at the percent of missing data seen in observed FHS data. My methods of MI are cross-sectional FCS (XFCS, imputation model uses other components at same time), longitudinal FCS (LFCS, uses same component at all times ignoring cross-component correlation), all FCS (AFCS, uses all components at all times) and 2-fold FCS (2fFCS, uses all components at current and adjacent times). I compare percent bias, confidence interval width, coverage probability and relative efficiency for three mechanisms of missing data (MCAR,MAR,MNAR), two sample sizes (n=1000,100), and two numbers of imputed datasets (m=5,20). All longitudinal methods (not XFCS) yield nearly identical results with unbiased estimates of means, correlations and slopes. Increase in precision and relative efficiency is small when augmenting from 5 to 20 imputations. Finally, I compare the imputation methods and CC analysis in survival models using HAI as a time-dependent variable to predict mortality. I simulate HAI data as described above, time-to-death using piece-wise exponential models, and I impose type I and random censoring on 32% of observations. CC analysis reduces sample size by 10%, produces unbiased estimates, but inflates standard errors. The three longitudinal imputation methods introduce minimal bias (<5%) in the hazard ratio estimates, while reducing the standard error up to 10% compared with CC. Overall, I show that multiple imputation using longitudinal methods is beneficial in the setting of repeated measurements of a score variable. It works well in analyzing changes over time and in time-dependent survival analyses. Biostatistics
182	Various Approaches on Parameter Estimation in Mixture and Non-mixture Cure Models Kutal, Durga Hari 13 October 2018 (has links) <p> Analyzing life-time data with long-term survivors is an important topic in medical application. Cure models are usually used to analyze survival data with the proportion of cure subjects or long-term survivors. In order to include the proportion of cure subjects, mixture and non-mixture cure models are considered. In this dissertation, we utilize both maximum likelihood and Bayesian methods to estimate model parameters. Simulation studies are carried out to verify the finite sample performance of the estimation methods. Real data analyses are reported to illustrate the goodness-of-fit via Fréchet, Weibull and Exponentiated Exponential susceptible distributions. Among the three parametric susceptible distributions, Fréchet is the most promising. </p><p> Next, we extend the non-mixture cure model to include a change point in a covariate for right censored data. The smoothed likelihood approach is used to address the problem of a log-likelihood function which is not differentiable with respect to the change point. The simulation study is based on the non-mixture change point cure model with an exponential distribution for the susceptible subjects. The simulation results revealed a convincing performance of the proposed method of estimation.</p><p> Biostatistics
183	Is Complete Case Analysis Appropriate For Cox Regression with Missing Covariate Data? Zhu, Min 25 May 2018 (has links) <p> <i>Purpose:</i> Complete case analysis of survival datasets with missing covariates in Cox proportional hazards model relies heavily on strong and usually unverifiable missing mechanism assumptions such as missing completely at random (MCAR) to produce reasonable parameter estimates. Based on the nature of survival data, missing at random (MAR) for missing covariates can be further decomposed into 1) censoring ignorable missing at random (CIMAR) and 2) failure ignorable missing at random (FIMAR). Unlike MCAR and MAR, there are procedures to assess whether missingness of covariates in survival data are consistent with CIMAR or FIMAR. In my thesis, I investigate the performances of the complete case analysis under various missing mechanisms in Cox model and demonstrate the procedures for checking consistency with CIMAR or FIMAR. </p><p> <i>Experimental design:</i> For research involving missing data, simulation studies are especially useful while studying the performance of some estimation (e.g. complete case analysis) as all parameters are pre-specified and known. I simulate survival data with missing covariates under various missing data mechanisms including MCAR, missing at random (MAR), missing not at random (MNAR), CIMAR and FIMAR. I then perform complete case Cox regression on simulated datasets and compare results to determine which missingness mechanisms produce reasonable parameter estimates. Finally, I perform a two-step procedure to check whether covariate missingness is consistent with CIMAR or FIMAR on a real dataset as outlined by Rathouz (2006).</p><p> <i>Results:</i> This simulation study illustrates that when covariate missingness is FIMAR but not CIMAR, complete case Cox regression produces reasonable parameter estimates similar to when missingness is MCAR. When covariate missingness is CIMAR, complete case Cox regression produces biased parameter estimates. The two-step procedure suggests covariate missingness in the Stanford heart transplant data is consistent with FIMAR.</p><p> <i>Conclusions:</i> Survival data with missing covariates that are FIMAR are appropriate for complete case analysis in Cox models. Survival data with missing covariates that are CIMAR are not appropriate for complete case analysis in Cox models. Under independent censoring, it should be possible for researchers to check the consistency of missing covariates in survival data with FIMAR and CIMAR assumptions. If missingness is consistent with FIMAR, complete case Cox regression should produce reasonable estimates. If missingness is consistent with CIMAR or if the data is inconsistent with both CIMAR and FIMAR, complete case Cox regression may produce biased estimates and researchers should consider sensitivity analyses.</p><p> Biostatistics
184	False positive rates encountered in the detection of changes in periodontal attachment level Gunsolley, John C. 01 January 1987 (has links) This thesis demonstrates that the assumption of normality used by Goodson results in the underestimation of the type I error rate of the tolerance method by a factor of 10. This underestimation is due to the positive kurtosis demonstrated in the distribution of replicate differences. Therefore, the assumption of normality does not seem warranted. It is shown here that a resampling technique more accurately estimates the type I error rate. The estimates of false positive rates have important implications in the field of periodontics. When diagnostic decisions are based on single measurements, false positive rates are high. Even when thresholds as high as 3 mm. are used, over 3 out of 10 sites identified as "changed" have not changed. Unfortunately, in the clinical practice of periodontics, single measurements are commonly used. Therefore, clinicians who make treatment decisions based on attachment level measurements, may be treating a large percentage of sites that have not undergone destructive periodontal disease. Clinical periodontists generally regard a loss of attachment of 3 mm. or more as evidence of progressively worsening disease requiring additional therapy. The consequences of treating areas that are erroneously concluded as having progressed have to be compared to the consequences of not treating areas that are progressing. If a clinician treats sites when a change of 3 mm. in attachment level is detected, it is likely that as many as 32% of the sites may not have progressed. However, if the change in attachment level is real and the site is not treated, a significant proportion of the attachment may be lost. Changes of 3 mm. are large compared to the length of the root of the tooth. Weine (1982, p. 208-209), using Black's (1902) description of tooth anatomy, presents average root length of 13 categories of teeth. Average root lengths range from 12 to 16.5 mm. for the 13 categories. If a tooth with a root of 14 mm. (near the middle of the range of average tooth length) has a change in attachment level measurements of 3 mm., the clinician is faced with a dilemma as to whether the site should be treated. The dilemma is increased if prior to the change of 3 mm., the site had already lost 50% of its attachment. In this situation the 3 mm. change represents nearly half of the remaining attachment. For these reasons, better measurement techniques would be beneficial in the clinical practice of periodontics. A controversy exists in the periodontal literature on the ability of single attachment level measurements to find actual change in attachment level. Two recent reports are in general agreement with this study. Imrey (1986) evaluates the ability of single measurements of attachment level to find change in attachment level. He concludes: "If true disease is uncommon and sensitivity to it is not high, these false positives may exceed in number the true positives detected" (p. 521). Ralls and Cohen (1986) reach similar conclusions: "the major issue is that 'bursts' of change can be explained by chance events which arise from measurement error and which occur at low but theoretically expected levels" (p. 751). The results of the present research demonstrate that a large percentage of the perceived change in attachment level is due to measurement error, but not to the degree that Imrey (1986) and Ralls and Cohen (1986) suggest. These researchers attribute almost all the attachment level changes to measurement error. In contrast, Aeppli, D. M., Boen, J. R., and Bandt, C. L. (1984) reach a different conclusion: "using an observed increase of greater than 1 mm. as a diagnostic rule leads to high sensitivity and yet satisfactorily high specificity" (p. 264). All three of the above referenced studies base their conclusions on estimates of sensitivity and specificity. The methods of obtaining estimates of sensitivity and specificity vary between the studies. Aeppli, D. M., Boen, J. R., and Bandt, C. L. base their estimates of specificity and sensitivity on a calibration study involving 34 patients and 3 examiners. Their distribution of differences in replicated measurements is similar to the distribution that Goodson (1986) reports. Irnrey (1986) and Ralls and Cohen ( 1986), instead of using actual data, simulate the distribution of differences by using a normal approximation with standard deviations of 1.125 mm. and 1 mm. respectively. Even though the methods of obtaining data vary, all the reports obtain high values of specificity (Table 6). However, estimates of sensitivity vary both within and among the three studies. Table 6 demonstrates that for similar thresholds the studies obtain a wide range of estimates of sensitivity. Within each study estimates of sensitivity are shown to be highly dependent on the assumed magnitude of actual change and the threshold used to detect the change. As the threshold decreases or the assumed attachment level change increases, sensitivity increases. The possible wide range of estimates that can be obtained within a study is demonstrated by Ralls and Cohen (1986). Their estimates of sensitivity range from .0668 to .9772. As discussed in chapter 1, the broad range of estimates of sensitivity and those estimates' basis on arbitrary assumptions brings to question their value. Biostatistics
185	Conditional Associations with Big Data: Estimating Adjusted Rank Correlations in the Electronic Health Record Fu, Lingjun 03 August 2017 (has links) In this thesis, we apply and adapt a new method to assess conditional associations in a large dataset from the Vanderbilt University Medical Center Electronic Health Record (EHR). We estimate pairwise rank correlations among disease status and lab values in the EHR after adjusting for demographical information. Our covariate-adjusted rank correlations involve fitting cumulative probability models (CPMs), extracting probability-scale residuals (PSRs) from these models, and computing the sample correlation between PSRs for different outcomes. This approach is rank-based, robust, and applicable to a variety of data types. Computational challenges arise with large datasets, particularly when we apply these methods to continuous outcome variables such as most lab values; we propose some workaround solutions. We present our results with estimates and confidence intervals for the partial Spearmanâs rank correlations among all pairwise combinations of the most frequent 250 ICD codes and 50 lab results among 472,570 patients with data in the EHR. We also present results stratified by sex and diabetes status, demonstrating how to assess for differences in correlations between different population strata. Biostatistics
186	Improving Modern Techniques of Causal Inference: Finite Sample Performance of ATM and ATO Doubly Robust Estimators, Variance Estimation for ATO Estimators, and Contextualized Tipping Point Sensitivity Analyses for Unmeasured Confounding D'Agostino McGowan, Lucy 28 March 2018 (has links) While estimators that incorporate both direct covariate adjustment and inverse probability weighting have drawn considerable interest, their finite sample properties have been challenged in seminal papers, such as Freedman and Berk (2008). We derive a doubly robust ATO estimator and demonstrate excellent finite sample performance for ATO and ATM doubly robust estimators in the setting of Freedman and Berk (2008). The methods and performance of variance estimators for IPW and IPW doubly robust estimators incorporating the recently defined ATO weights are an important open question in the field. We derive the large-sample variance estimator for the ATO doubly robust estimator for generalized linear models with identity, log, or logistic links. We conduct simulations to compare this estimator to common model-fitting practices, demonstrating under which conditions our estimated variance is preferred. Unobserved confounding remains a limitation for doubly robust estimators. We have worked to reframe the seminal work of Rosenbaum and Rubin (1983), Lin, Psaty, and Kronmal (1998), and Vanderweele and Ding (2017) to a formulation of a sensitivity to unmeasured confounders analysis that appeals to medical researchers. We offer guidelines to researchers for anchoring the tipping point analysis in the context of the study and introduce the R package tipr. Biostatistics
187	On Optimal Prediction Rules With Prospective Missingness and Bagged Empirical Null Inference in Large-Scale Data Mercaldo, Sarah Fletcher 18 September 2017 (has links) This dissertation consists of three papers related to missing data, prediction, and large scale inference. The first paper defines the problem of obtaining predictions from an exist- ing clinical risk prediction model when covariates are missing. We introduce the Pattern Mixture Kernel Submodel - submodels fit within each missing data pattern - that minimize prediction error in the presence of missingness. PMKS is explored in simulations and a case study, outperforming standard simple and multiple imputation techniques. The second paper introduces the Bagged Empirical Null p-value, a new algorithm that combines exist- ing methodology of Bagging and Empirical Null techniques to identify important effects in massive high-dimensional data. We illustrate the approach using a famous leukemia gene example where we uncovered new findings that are supported by previously published bench- work and we evaluate the algorithmâs performance in novel pseudo-simulations. The third paper gives recommendations for including the outcome in the imputation model during construction, validation, and application. We suggest only including the outcome for impu- tation of missing covariate values during model construction to obtain unbiased parameter estimates. When the outcome is used in the imputation algorithm during the validation step, we show through simulation, the model prediction metrics are optimistically inflated, and the actual pragmatic model performance would be inferior to the validated results. While the three papers presented here provide foundations for missing data and large scale inferential techniques, these ideas are applicable to a wide range of biomedical settings. Biostatistics
188	Design and Analysis Considerations for Complex Longitudinal and Survey Sampling Studies Mercaldo, Nathaniel David 21 September 2017 (has links) Pre-existing cohort data (e.g., electronic health records) are being increasingly available, and the need for novel and efficient uses of these data is paramount due to resource constraints. This dissertation consists of three chapters relating to the design and analysis of longitudinal and survey sampling studies when utilizing these types of data. In chapter one, we extend outcome-dependent sampling (ODS) designs for longitudinal binary data to permit data collection in two stages. We consider two subclasses of designs: fixed designs where the designs at each stage are pre-specified, and adaptive designs that utilize stage one data to improve design choice at stage two. We demonstrate that data from both stages can be aggregated to generate valid parameter estimates using ascertainment-corrected maximum likelihood methods. Efficiency gains are observed compared to random sampling, and in certain situations, single-stage ODS sampling designs. In chapter two, we investigate the effects of utilizing an imperfect sampling frame on the design, and analysis of complex survey data. We explore the impact of stratum misclassification on the choice of study design, on the operating characteristics of survey estimators, and on the appropriateness of two common approaches to survey design analysis. Stratified sampling is recommended over random sampling if interest lies in making inferential statements regarding rare subgroups. In the presence of misclassification, the relative efficiency depends on the subgroup prevalence, and analytic methods that account for the design are still required for valid inferences. In chapter three, we introduce the MMLB R package which is used to estimate parameters from marginalized regression models for longitudinal binary data. These models are described, and estimation procedures outlined under random, and ODS schemes. We provide examples to demonstrate how to fit these models, and how data may be generated under a pre-specified marginal mean model. We hope these chapters provide specific and general insights that will improve our ability to conduct efficient research studies under resource constraints. Biostatistics
189	Talking About My Care: Detecting Mentions of Hormonal Therapy Adherence Behavior From an Online Breast Cancer Community Yin, Zhijun 27 November 2017 (has links) Hormonal therapy adherence is challenging for many patients with hormone-receptor-positive breast cancer. Gaining intuition into their adherence behavior would assist in improving outcomes by pinpointing, and eventually addressing, why patients fail to adhere. While traditional adherence studies rely on survey-based methods or electronic medical records, online health communities provide a supplemental data source to learn about such behavior and often on a much larger scale. In this paper, we focus on an online breast cancer discussion forum and propose a framework to automatically extract hormonal therapy adherence behavior (HTAB) mentions. The framework compares medical term usage when describing when a patient is taking hormonal therapy medication and interrupting their treatment (e.g., stop/pause taking medication). We show that by using shallow neural networks, in the form of word2vec, the learned features can be applied to build efficient HTAB mention classifiers. Through medical term comparison, we find that patients who exhibit an interruption behavior are more likely to mention depression and their care providers, while patients with continuation behavior are more likely to mention common side effects (e.g., hot flashes, nausea and osteoporosis), vitamins and exercise. Biostatistics
190	A Comparison of Approaches for Unplanned Sample Size Changes in Phase II Clinical Trials Olson, Molly Ann 20 June 2017 (has links) Oncology phase II clinical trials are used to evaluate the initial effect of a new regimen to determine if there warrants further study in a phase III clinical trial. Two-stage designs with an early futility stop are commonly used in these phase II trials. It is common for attained sample sizes in these trials to be different from the designed sample sizes due to over- and under- enrollment. Currently, when the attained sample size differs from that planned, common practice is to treat the attained sample sizes as planned, and this practice leads to invalid inference. In this thesis, we examine the problems and solutions in hypothesis testing for two-stage phase II clinical trial designs when attained sample sizes differ from the planned sample sizes. We describe existing methods for redesigning trials when there is over- or under-enrollment in either the first or second stage and introduce a new method for redesigning a two-stage clinical trial when the first stage sample size deviates from planned. We focus our investigation when there is over- or under-enrollment in the first stage. We compare the frequentist methods of Chang et al., Olson and Koyama, and the Likelihood two-stage design by applying these methods to two-stage designs with deviations in the first stage of +/- 10. We examine type I error rates, power, probability of early termination and expected sample size under the null hypothesis in a number of two-stage designs. We also compare error rates in these methods using a Monte Carlo simulation. Biostatistics

Search results