141 |
Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic DataKang, Dongwan Don 22 September 2011 (has links)
Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.
|
142 |
Joint Modeling Of Censored Longitudinal and Event Time DataPike, Francis 23 September 2011 (has links)
Longitudinal censoring is a common artifact when evaluating biomarkers and an obstacle to overcome when jointly investigating the longitudinal nature of the data and the impact on the survival prognoses of a study population. To fully appreciate the complexity of this scenario one has to devise a modeling strategy that can simultaneously account for (i) longitudinal censoring, (ii) outcome dependent dropout, and potentially (iii) correlated biomarkers. In this thesis we propose a novel joint modeling approach to account for the aforementioned issues by linking
together a univariate or multivariate Tobit mixed effects model to a suitable parametric event time distribution. This method is significant to public health research since it enables researchers to evaluate the evolution of the disease process in the presence of complex biomarker data where there may be censoring, correlation, and outcome dependent dropout. This approach allows for the analysis of data in a single unified framework. The performance of the proposed Joint Tobit model will be compared to the commonly used "fill-in" methods for censored longitudinal data in a joint modeling framework. Furthermore, we will show that the implementation of our proposed model is fairly straightforward in commercially available software, thus avoiding the complexity and problem specific nature of the expectation maximization (EM) algorithm.
|
143 |
Statistical Methods for Evaluating Biomarkers Subject to Detection LimitKim, Yeonhee 22 September 2011 (has links)
As a cost effective diagnostic tool, numerous candidate biomarkers have been emerged for different diseases. The increasing effort of discovering informative biomarkers highlights the need for valid statistical modeling and evaluation. Our focus is on the biomarker data which are both measured repeatedly over time and censored by the sensitivity of given assay. Inappropriate handling of these types of data can cause biased results, resulting in erroneous medical decision.
In the first topic, we extend the discriminant analysis to censored longitudinal biomarker data based on linear mixed models and modified likelihood function. The performance of biomarker is evaluated by area under the receiver operation characteristic (ROC) curve (AUC). The simulation study shows that the proposed method improves both parameter and AUC estimation over substitution methods when normality assumption is satisfied for biomarker data. Our method is applied to the biomarker study for acute kidney injury patients. In the second topic, we introduce a simple and practical evaluation method for censored longitudinal biomarker data. A modification of the linear combination approach by Su and Liu enables us to calculate the optimum AUC as well as relative importance of measurements from each time point. The simulation study demonstrates that the proposed method performs well in a practical situation. The application to real-world data is provided. In the third topic, we consider censored time-invariant biomarker data to discriminate time to event or cumulative events by a particular time point. C-index and time dependent ROC curve are often used to measure the discriminant potential of survival model. We extend these methods to censored biomarker data based on joint likelihood approach. Simulation study shows that the proposed methods result in accurate discrimination measures. The application to a biomarker study is provided.
Both early detection and accurate prediction of disease are important to manage serious public health problems. Because many of diagnostic tests are based on biomarkers, discovery of informative biomarker is one of the active research areas in public health. Our methodology is important for public health researchers to identify promising biomarkers when the measurements are censored by detection limits.
|
144 |
Longitudinal Data Analysis in Depression Studies: Assessment of Intermediate-Outcome-Dependent Dynamic InterventionsHsu, Yenchih 23 September 2011 (has links)
Longitudinal studies in the treatment of mental diseases, such as chronic forms of major depressive disorders, frequently use sequential randomization design to investigate treatment strategies. Outcomes in such studies often consist of repeated measurements of scores, such as the 24-item Hamilton Rating Scale for Depression, throughout the duration of the therapy. The goal is to compare different sequences of treatments to find the most beneficial one for each patient. Note that since treatments are applied sequentially, the eligibility of receiving one treatment assignment depends on previous treatments and outcomes. Two issues that make the analysis of data from such sequential designs different from standard longitudinal data are: (1) the randomization in the subsequent stages for patients who fail to respond in the previous stage; and (2) the drop-out of patients, for which the assumption of missing completely at random is usually not realistic. In this dissertation, we show how the inverse-probability-weighted generalized estimating equations (IPWGEE) method can be used to draw inference for treatment regimes from two-stage studies. Specifically, we show how to construct weights and use them in the IPWGEE to derive consistent estimators for the effects of treatment regimes, and compare them. Large-sample properties of the proposed estimators are derived analytically, and examined through simulations. We demonstrate our methods by applying them to a depression dataset.
Public Health Significance: Mental illness is becoming a major public health challenge. Strategies of multiple treatments have been introduced by many investigators to serve as an alternative to single strategy in treating patients with chronic depressive disorders. As the complexity of study design increases, developing sophisticated statistical method is necessary in order to provide valid inference. This dissertation demonstrates the importance of statistical aspects to estimate the effects of depression treatment regimes from two-stage longitudinal studies.
|
145 |
Hierarchical Likelihood Inference on Clustered Competing Risks DataChristian, Nicholas J. 23 September 2011 (has links)
Frailties models, an extension of the proportional hazards model, are used to model clustered survival data. In some situations there may be competing risks within a cluster. When this happens the basic frailty model is no longer appropriate. Depending on the purpose of the analysis, either the cause-specific hazard frailty model or the subhazard frailty model needs to be used. In this work, hierarchical likelihood (h-likelihood) methods are extended to provide a new method for fitting both types of competing risks frailty models. Methods for model selection as well as testing for covariate and clustering effects are discussed. Simulations show that in cases with little information, the h-likelihood method can perform better than the penalized partial likelihood method for estimating the subhazard frailty model. Additional simulations demonstrate that h-likelihood performs well when estimating the cause-specific hazard frailty model assuming both a univariate and bivariate frailty distribution. A real example from a breast cancer clinical trial is used to demonstrate using h-likelihood to fit both types of competing risks frailty models.
Public health significance: When researchers have clustered survival data and the observations within those clusters can experience multiple types of events the popular proportional hazards model is no longer appropriate and can lead to biased estimates. For the results of a clinical study to be meaningful the estimated effects of treatments and other covariates needs to be accurate. H-likelihood methods are an alternative to existing procedures and can provide less bias and more accurate information which will ultimately lead to better patient care.
|
146 |
Use of Pseudo-observations in the Goodness-of-Fit Test for Gray's Time-Varying Coefficients ModelKang, Hyung-joo 28 September 2011 (has links)
Survival analysis has been used to estimate underlying survival or failure probabilities and to estimate the effects of covariates on survival times. The Cox proportional hazards regression model is the most commonly used approach. However, in practical situations, the assumption of proportional hazards (PH) is often violated. The assumption does not hold, for example, in the presence of the time-varying effect of a covariate. Several methods have been proposed to estimate this time-varying effect via a time-varying coefficient. The Gray time-varying coefficients model (TVC) is an extension of the Cox PH model that employs penalized spline functions to estimate time-varying coefficients. Currently, there is no method available to assess the overall goodness-of-fit for the Gray TVC model. In this study, we propose a method based on pseudo-observations. By using pseudo-observations, we are able to calculate residuals for all individuals at all time points. This avoids concerns with the presence of censoring and allows us to apply the residual plots used in general linear regression models to assess the overall goodness of fit for censored survival regression models. Perme and Andersen used the pseudo-observations method to assess the fit for the Cox PH model. We extend their method to assess the fit for the Gray TVC model and illustrate how we applied this approach to assess the fit for a model that predicts posttransplant survival probability among children who were under the age of 12 years, had end-stage liver disease, and underwent liver transplantation between January 2005 and June 2010.
The method has significant public health impact. The Cox PH model is the most cited regression method in medical research. When data violate the PH assumption, The Gray TVC model or an alternative should be used in order to obtain unbiased estimates on survival function and give correct inference on the relationship between potential covariates and survival. The proposed goodness-of-fit test offers a tool to investigate how well the model fits the data. If results show a lack of fit, further modification for the model is necessary in order to obtain more accurate estimates.
|
147 |
Comparison of prognostic markers for censored outcomes: application in the NSABP B-14 studyYan, Peng 23 September 2011 (has links)
Prognostic markers for risk of recurrence or mortality are becoming very popular and important in the decision making process of cancer patients and their physicians. Those with good prognostics can avoid unnecessary chemotherapies and the resulted agony. The receiver operating characteristic (ROC) curves are often used to assess and compare prognostic markers for binary outcomes. However, they cannot be directly used in assessing prognostic markers for time-to-event outcomes, which are usually subject to censoring. Recently several statistical methods such as the C¨Cindex, time-dependent ROC curve and the predictiveness curve have been developed for this purpose. In early stage estrogen receptor-positive (ER+) breast cancer, the 21-gene panel Oncotype DX assay and the Adjuvant!, based on age, tumor size and grade and other clinical variables, are widely used tools for patient prognosis and provide guidance in decision making. The recurrence score (RS) from the Oncotype DX assay and a risk index (RI) summarized from Adjuvant! both provide quantitative evaluation of recurrence risk. Here we applied those recently developed statistical methods to compare the prognostic utility of RS and RI in ER+, node-negative (N0), and tamoxifen-treated breast cancer patients enrolled on the National Surgical Adjuvant Breast and Bowel Project (NSABP) B-14 trial. We showed that the RS was a stronger prognostic marker than RI, and combining RS with clinical variables also improved the prognostic utility in the NSABP B-14 trial. The results will help to improve treatment decision for breast cancer patients in public health practice.
|
148 |
Generalized Linear Mixed Modeling to Examine the Relationship Between Self Efficacy and Smoking CessationParzynski, Craig S 23 September 2011 (has links)
The relationship between self efficacy and smoking cessation is unclear. Self efficacy is often viewed as a causal antecedent for future abstinence from smoking, a primary outcome of cessation studies. However, recent research has questioned whether the participants report of self efficacy is a reflection on previous abstinence success or failure rather than a precursor. To elucidate the dynamic relationship between self efficacy and abstinence status, two generalized linear mixed models were developed. The first examined the ability of self efficacy to predict next days abstinence, while the second examined the ability of abstinence to predict self efficacy ratings taken later that same day. All data came from a 2 x 2 crossover trial examining how interest to quit smoking and monetary reinforcement for abstinence affect the short term effects of medication on abstinence from smoking. Participants received both medication and placebo conditions in consecutive phases in a counter-balanced order, with an ad lib smoking washout period in between. Abstinence from smoking and self efficacy was recorded daily during both medication phases. Participants were 124 smokers, mean age 31.1(SE: 1.0), who smoked on average 16.3 (SE: 0.5) cigarettes per day and had a mean FTND score of 4.6 (SE: 0.1). The sample was comprised of 56.5% females. Results indicate that self efficacy is both a predictor of, and a reflection on abstinence status. Models were validated using bootstrapping procedures. These procedures revealed only a small amount of bias in the models. The effects observed in this study may be constrained by the timing of assessments as well as the duration of the cessation attempt.
Public Health Importance: Tobacco use accounts for 443,000 deaths each year. Therefore, the development of successful clinical assessments to monitor smoking cessation efforts is of the utmost importance. Self efficacy is a measure of confidence to quit smoking. This study shows that the relationship between self efficacy and smoking cessation is bi-directional which may be influenced by the timing of assessments. Understanding this relationship may lead to more successful use of self efficacy as a clinical tool during smoking cessation attempts.
|
149 |
ISSUES IN META-ANALYSIS OF CANCER MICROARRAY STUDIES: DATA DEPOSITORY IN R AND A META-ANALYSIS METHOD FOR MULTI-CLASS BIOMARKER DETECTIONLU, SHU-YA 29 September 2009 (has links)
Systematic information integration of multiple related microarray studies has become an important issue as the technology has become significant mature and more prevalent in public health relevance over the past decade. The aggregated information provides more robust and accurate biomarker detection. So far, published meta-analysis methods for this purpose mostly consider two-class comparison. Methods for combining multiclass studies and expression pattern concordance are rarely explored. We first consider a natural extension of combining p-values from the traditional ANOVA model. Since p-values from ANOVA do not guarantee to reflect the concordant expression pattern information across studies, we propose a multi-class correlation measure (MCC) to specifically look for biomarkers of concordant inter-class patterns across a pair of studies. For both approaches, we focus on identifying biomarkers differentially expressed in all studies (i.e. ANOVA-maxP and min-MCC). The min-MCC method is further extended to identify biomarkers differentially expressed in partial studies using an optimally-weighted technique (OW-min-MCC). All methods are evaluated by simulation studies and by three meta-analysis applications to multi-tissue mouse metabolism data sets, multi-condition mouse trauma data sets and multi-malignant-condition human prostate cancer data sets. The results show complementary strength of ANOVA-based and MCC-based approaches for different biological purposes. For detecting biomarkers with concordant inter-class patterns across studies, min-MCC has better power and performance. If biomarkers with discordant inter-class patterns across studies are expected and are of biological interests, ANOVA-maxP better serves this purpose.
|
150 |
A BAYESIAN TEST OF NORMALITY WITH A MIXTURE OF TWO NORMALS AS THE ALTERNATIVE AND APPLICATIONS TO 'CLUSTER' ANALYSISSYMONS, MICHAEL JOSEPH. January 1969 (has links)
Thesis (Ph. D.)--University OF MICHIGAN.
|
Page generated in 0.0651 seconds