• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 311
  • 274
  • 28
  • 27
  • 14
  • 12
  • 10
  • 7
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 2
  • Tagged with
  • 783
  • 783
  • 270
  • 146
  • 136
  • 123
  • 109
  • 105
  • 101
  • 92
  • 88
  • 84
  • 62
  • 58
  • 55
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Study and validation of data structures with missing values. Application to survival analysis

Serrat i Piè, Carles 21 May 2001 (has links)
En aquest treball tractem tres metodologies diferents -no paramètrica, paramètrica i semiparamètrica- per tal de considerar els patrons de dades amb valors no observats en un context d'anàlisi de la supervivència. Les dues primeres metodologies han estat desenvolupades sota les hipòtesis de MCAR (Missing Completely at Random) o MAR (Missing at Random). Primer, hem utilitzat el mètode de remostreig de bootstrap i un esquema d'imputació basat en un model bilineal en la matriu de dades per tal d'inferir sobre la distribució dels paràmetres d'interès. Per una altra banda, hem analitzat els inconvenients a l'hora d'obtenir inferències correctes quan es tracta el problema de forma totalment paramètrica, a la vegada que hem proposat algunes estratègies per tenir en compte la informació complementària que poden proporcionar altres covariants completament observades.De tota manera, en general no es pot suposar la ignorabilitat del mecanisme de no resposta. Aleshores, ens proposem desenvolupar un mètode semiparamètric per a l'anàlisi de la supervivència quan tenim un patró de no resposta no ignorable. Primer, proposem l'estimador de Kaplan-Meier Agrupat (GKM) com una alternativa a l'estimador KM estàndard per tal d'estimar la supervivència en un nombre finit de temps fixats. De tota manera, quan les covariants són parcialment observades ni l'estimador GKM estratificat ni l'estimador KM estratificat poden ser calculats directament a partir de la mostra. Aleshores, proposem una classe d'equacions d'estimació per tal d'obtenir estimadors semiparamètrics de les probabilitats i substituïm aquestes estimacions en l'estimador GKM estratificat. Ens referim a aquest nou estimador com l'estimador Kaplan-Meier Agrupat-Estimat (EGKM). Demostrem que els estimadors GKM i EGKM són arrel quadrada consistents i que asimptòticament segueixen una distribució normal multivariant, a la vegada que obtenim estimadors consistents per a la matriu de variància-covariància límit. L'avantatge de l'estimador EGKM és que proporciona estimacions no esbiaixades de la supervivència i permet utilitzar un model de selecció flexible per a les probabilitats de no resposta. Il·lustrem el mètode amb una aplicació a una cohort de pacients amb Tuberculosi i infectats pel VIH. Al final de l'aplicació, duem a terme una anàlisi de sensibilitat que inclou tots els tipus de patrons de no resposta, des de MCAR fins a no ignorable, i que permet que l'analista pugui obtenir conclusions després d'analitzar tots els escenaris plausibles i d'avaluar l'impacte que tenen les suposicions en el mecanisme no ignorable de no resposta sobre les inferències resultants.Acabem l'enfoc semiparamètric explorant el comportament de l'estimador EGKM per a mostres finites. Per fer-ho, duem a terme un estudi de simulació. Les simulacions, sota escenaris que tenen en compte diferents nivells de censura, de patrons de no resposta i de grandàries mostrals, il·lustren les bones propietats que té l'estimador que proposem. Per exemple, les probabilitats de cobertura tendeixen a les nominals quan el patró de no resposta fet servir en l'anàlisi és proper al vertader patró de no resposta que ha generat les dades. En particular, l'estimador és eficient en el cas menys informatiu dels considerats: aproximadament un 80% de censura i un 50% de dades no observades. / In this work we have approached three different methodologies --nonparametric, parametric and semiparametric-- to deal with data patterns with missing values in a survival analysis context. The first two approaches have been developed under the assumption that the investigator has enough information and can assume that the non-response mechanism is MCAR or MAR. In this situation, we have adapted a bootstrap and bilinear multiple imputation scheme to draw the distribution of the parameters of interest. On the other hand, we have analyzed the drawbacks encountered to get correct inferences, as well as, we have proposed some strategies to take into account the information provided by other fully observed covariates.However, in many situations it is impossible to assume the ignorability of the non-response probabilities. Then, we focus our interest in developing a method for survival analysis when we have a non-ignorable non-response pattern, using a semiparametric perspective. First, for right censored samples with completely observed covariates, we propose the Grouped Kaplan-Meier estimator (GKM) as an alternative to the standard KM estimator when we are interested in the survival at a finite number of fixed times of interest. However, when the covariates are partially observed, neither the stratified GKM estimator, nor the stratified KM estimator can be directly computed from the sample. Henceforth, we propose a class of estimating equations to obtain semiparametric estimates for these probabilities and then we substitute these estimates in the stratified GKM estimator. We refer to this new estimation procedure as Estimated Grouped Kaplan-Meier estimator (EGKM). We prove that the GKM and EGKM estimators are squared root consistent and asymptotically normal distributed, and a consistent estimator for their limiting variances is derived. The advantage of the EGKM estimator is that provides asymptotically unbiased estimates for the survival under a flexible selection model for the non-response probability pattern. We illustrate the method with a cohort of HIV-infected with Tuberculosis patients. At the end of the application, a sensitivity analysis that includes all types of non-response pattern, from MCAR to non-ignorable, allows the investigator to draw conclusions after analyzing all the plausible scenarios and evaluating the impact on the resulting inferences of the non-ignorable assumptions in the non-response mechanism.We close the semiparametric approach by exploring the behaviour of the EGKM estimator for finite samples. In order to do that, a simulation study is carried out. Simulations performed under scenarios taking into account different levels of censoring, non-response probability patterns and sample sizes show the good properties of the proposed estimator. For instance, the empirical coverage probabilities tend to the nominal ones when the non-response pattern used in the analysis is close to the true non-response pattern that generated the data. In particular, it is specially efficient in the less informative scenarios (e,g, around a 80% of censoring and a 50% of missing data).
232

Survival analysis issues with interval-censored data

Oller Piqué, Ramon 30 June 2006 (has links)
L'anàlisi de la supervivència s'utilitza en diversos àmbits per tal d'analitzar dades que mesuren el temps transcorregut entre dos successos. També s'anomena anàlisi de la història dels esdeveniments, anàlisi de temps de vida, anàlisi de fiabilitat o anàlisi del temps fins a l'esdeveniment. Una de les dificultats que té aquesta àrea de l'estadística és la presència de dades censurades. El temps de vida d'un individu és censurat quan només és possible mesurar-lo de manera parcial o inexacta. Hi ha diverses circumstàncies que donen lloc a diversos tipus de censura. La censura en un interval fa referència a una situació on el succés d'interès no es pot observar directament i només tenim coneixement que ha tingut lloc en un interval de temps aleatori. Aquest tipus de censura ha generat molta recerca en els darrers anys i usualment té lloc en estudis on els individus són inspeccionats o observats de manera intermitent. En aquesta situació només tenim coneixement que el temps de vida de l'individu es troba entre dos temps d'inspecció consecutius.Aquesta tesi doctoral es divideix en dues parts que tracten dues qüestions importants que fan referència a dades amb censura en un interval. La primera part la formen els capítols 2 i 3 els quals tracten sobre condicions formals que asseguren que la versemblança simplificada pot ser utilitzada en l'estimació de la distribució del temps de vida. La segona part la formen els capítols 4 i 5 que es dediquen a l'estudi de procediments estadístics pel problema de k mostres. El treball que reproduïm conté diversos materials que ja s'han publicat o ja s'han presentat per ser considerats com objecte de publicació.En el capítol 1 introduïm la notació bàsica que s'utilitza en la tesi doctoral. També fem una descripció de l'enfocament no paramètric en l'estimació de la funció de distribució del temps de vida. Peto (1973) i Turnbull (1976) van ser els primers autors que van proposar un mètode d'estimació basat en la versió simplificada de la funció de versemblança. Altres autors han estudiat la unicitat de la solució obtinguda en aquest mètode (Gentleman i Geyer, 1994) o han millorat el mètode amb noves propostes (Wellner i Zhan, 1997).El capítol 2 reprodueix l'article d'Oller et al. (2004). Demostrem l'equivalència entre les diferents caracteritzacions de censura no informativa que podem trobar a la bibliografia i definim una condició de suma constant anàloga a l'obtinguda en el context de censura per la dreta. També demostrem que si la condició de no informació o la condició de suma constant són certes, la versemblança simplificada es pot utilitzar per obtenir l'estimador de màxima versemblança no paramètric (NPMLE) de la funció de distribució del temps de vida. Finalment, caracteritzem la propietat de suma constant d'acord amb diversos tipus de censura. En el capítol 3 estudiem quina relació té la propietat de suma constant en la identificació de la distribució del temps de vida. Demostrem que la distribució del temps de vida no és identificable fora de la classe dels models de suma constant. També demostrem que la probabilitat del temps de vida en cadascun dels intervals observables és identificable dins la classe dels models de suma constant. Tots aquests conceptes elsil·lustrem amb diversos exemples.El capítol 4 s'ha publicat parcialment en l'article de revisió metodològica de Gómez et al. (2004). Proporciona una visió general d'aquelles tècniques que s'han aplicat en el problema no paramètric de comparació de dues o més mostres amb dades censurades en un interval. També hem desenvolupat algunes rutines amb S-Plus que implementen la versió permutacional del tests de Wilcoxon, Logrank i de la t de Student per a dades censurades en un interval (Fay and Shih, 1998). Aquesta part de la tesi doctoral es complementa en el capítol 5 amb diverses propostes d'extensió del test de Jonckeere. Amb l'objectiu de provar una tendència en el problema de k mostres, Abel (1986) va realitzar una de les poques generalitzacions del test de Jonckheere per a dades censurades en un interval. Nosaltres proposem altres generalitzacions d'acord amb els resultats presentats en el capítol 4. Utilitzem enfocaments permutacionals i de Monte Carlo. Proporcionem programes informàtics per a cada proposta i realitzem un estudi de simulació per tal de comparar la potència de cada proposta sota diferents models paramètrics i supòsits de tendència. Com a motivació de la metodologia, en els dos capítols s'analitza un conjunt de dades d'un estudi sobre els beneficis de la zidovudina en pacients en els primers estadis de la infecció del virus VIH (Volberding et al., 1995).Finalment, el capítol 6 resumeix els resultats i destaca aquells aspectes que s'han de completar en el futur. / Survival analysis is used in various fields for analyzing data involving the duration between two events. It is also known as event history analysis, lifetime data analysis, reliability analysis or time to event analysis. One of the difficulties which arise in this area is the presence of censored data. The lifetime of an individual is censored when it cannot be exactly measured but partial information is available. Different circumstances can produce different types of censoring. Interval censoring refers to the situation when the event of interest cannot be directly observed and it is only known to have occurred during a random interval of time. This kind of censoring has produced a lot of work in the last years and typically occurs for individuals in a study being inspected or observed intermittently, so that an individual's lifetime is known only to lie between two successive observation times.This PhD thesis is divided into two parts which handle two important issues of interval censored data. The first part is composed by Chapter 2 and Chapter 3 and it is about formal conditions which allow estimation of the lifetime distribution to be based on a well known simplified likelihood. The second part is composed by Chapter 4 and Chapter 5 and it is devoted to the study of test procedures for the k-sample problem. The present work reproduces several material which has already been published or has been already submitted.In Chapter 1 we give the basic notation used in this PhD thesis. We also describe the nonparametric approach to estimate the distribution function of the lifetime variable. Peto (1973) and Turnbull (1976) were the first authors to propose an estimation method which is based on a simplified version of the likelihood function. Other authors have studied the uniqueness of the solution given by this method (Gentleman and Geyer, 1994) or have improved it with new proposals (Wellner and Zhan, 1997).Chapter 2 reproduces the paper of Oller et al. (2004). We prove the equivalence between different characterizations of noninformative censoring appeared in the literature and we define an analogous constant-sum condition to the one derived in the context of right censoring. We prove as well that when the noninformative condition or the constant-sum condition holds, the simplified likelihood can be used to obtain the nonparametric maximum likelihood estimator (NPMLE) of the failure time distribution function. Finally, we characterize the constant-sum property according to different types of censoring. In Chapter 3 we study the relevance of the constant-sum property in the identifiability of the lifetime distribution. We show that the lifetime distribution is not identifiable outside the class of constant-sum models. We also show that the lifetime probabilities assigned to the observable intervals are identifiable inside the class of constant-sum models. We illustrate all these notions with several examples.Chapter 4 has partially been published in the survey paper of Gómez et al. (2004). It gives a general view of those procedures which have been applied in the nonparametric problem of the comparison of two or more interval-censored samples. We also develop some S-Plus routines which implement the permutational version of the Wilcoxon test, the Logrank test and the t-test for interval censored data (Fay and Shih, 1998). This part of the PhD thesis is completed in Chapter 5 by different proposals of extension of the Jonckeere's test. In order to test for an increasing trend in the k-sample problem, Abel (1986) gives one of the few generalizations of the Jonckheree's test for interval-censored data. We also suggest different Jonckheere-type tests according to the tests presented in Chapter 4. We use permutational and Monte Carlo approaches. We give computer programs for each proposal and perform a simulation study in order compare the power of each proposal under different parametric assumptions and different alternatives. We motivate both chapters with the analysis of a set of data from a study of the benefits of zidovudine in patients in the early stages of the HIV infection (Volberding et al., 1995).Finally, Chapter 6 summarizes results and address those aspects which remain to be completed.
233

Probabilistic Models for Life Cycle Management of Energy Infrastructure Systems

Datla, Suresh Varma 04 July 2007 (has links)
The degradation of aging energy infrastructure systems has the potential to increase the risk of failure, resulting in power outage and costly unplanned maintenance work. Therefore, the development of scientific and cost-effective life cycle management (LCM) strategies has become increasingly important to maintain energy infrastructure. Since degradation of aging equipment is an uncertain process which depends on many factors, a risk-based approach is required to consider the effect of various uncertainties in LCM. The thesis presents probabilistic models to support risk-based life cycle management of energy infrastructure systems. In addition to uncertainty in degradation process, the inspection data collected by the energy industry is often censored and truncated which make it difficult to estimate the lifetime probability distribution of the equipment. The thesis presents modern statistical techniques in quantifying uncertainties associated with inspection data and to estimate the lifetime distributions in a consistent manner. Age-based and sequential inspection-based replacement models are proposed for maintenance of component in a large-distribution network. A probabilistic lifetime model to consider the effect of imperfect preventive maintenance of a component is developed and its impact to maintenance optimization is illustrated. The thesis presents a stochastic model for the pitting corrosion process in steam generators (SG), which is a serious form of degradation in SG tubing of some nuclear generating stations. The model is applied to estimate the number of tubes requiring plugging and the probability of tube leakage in an operating period. The application and benefits of the model are illustrated in the context of managing the life cycle of a steam generator.
234

Probabilistic Models for Life Cycle Management of Energy Infrastructure Systems

Datla, Suresh Varma 04 July 2007 (has links)
The degradation of aging energy infrastructure systems has the potential to increase the risk of failure, resulting in power outage and costly unplanned maintenance work. Therefore, the development of scientific and cost-effective life cycle management (LCM) strategies has become increasingly important to maintain energy infrastructure. Since degradation of aging equipment is an uncertain process which depends on many factors, a risk-based approach is required to consider the effect of various uncertainties in LCM. The thesis presents probabilistic models to support risk-based life cycle management of energy infrastructure systems. In addition to uncertainty in degradation process, the inspection data collected by the energy industry is often censored and truncated which make it difficult to estimate the lifetime probability distribution of the equipment. The thesis presents modern statistical techniques in quantifying uncertainties associated with inspection data and to estimate the lifetime distributions in a consistent manner. Age-based and sequential inspection-based replacement models are proposed for maintenance of component in a large-distribution network. A probabilistic lifetime model to consider the effect of imperfect preventive maintenance of a component is developed and its impact to maintenance optimization is illustrated. The thesis presents a stochastic model for the pitting corrosion process in steam generators (SG), which is a serious form of degradation in SG tubing of some nuclear generating stations. The model is applied to estimate the number of tubes requiring plugging and the probability of tube leakage in an operating period. The application and benefits of the model are illustrated in the context of managing the life cycle of a steam generator.
235

Marginal Methods for Multivariate Time to Event Data

Wu, Longyang 05 April 2012 (has links)
This thesis considers a variety of statistical issues related to the design and analysis of clinical trials involving multiple lifetime events. The use of composite endpoints, multivariate survival methods with dependent censoring, and recurrent events with dependent termination are considered. Much of this work is based on problems arising in oncology research. Composite endpoints are routinely adopted in multi-centre randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. In Chapter 2 we consider this issue in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. The limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. While there is considerable potential for more powerful tests of treatment effect when marginal methods are used, it is possible that problems related to dependent censoring can arise. This happens when the occurrence of one type of event increases the risk of withdrawal from a study and hence alters the probability of observing events of other types. The purpose of Chapter 3 is to formulate a model which reflects this type of mechanism, to evaluate the effect on the asymptotic and finite sample properties of marginal estimates, and to examine the performance of estimators obtained using flexible inverse probability weighted marginal estimating equations. Data from a motivating study are used for illustration. Clinical trials are often designed to assess the effect of therapeutic interventions on occurrence of recurrent events in the presence of a dependent terminal event such as death. Statistical methods based on multistate analysis have considerable appeal in this setting since they can incorporate changes in risk with each event occurrence, a dependence between the recurrent event and the terminal event and event-dependent censoring. To date, however, there has been limited methodology for the design of trials involving recurrent and terminal events, and we addresses this in Chapter 4. Based on the asymptotic distribution of regression coefficients from a multiplicative intensity Markov regression model, we derive sample size formulae to address power requirements for both the recurrent and terminal event processes. Superiority and non-inferiority trial designs are dealt with. Simulation studies confirm that the designs satisfy the nominal power requirements in both settings, and an application to a trial evaluating the effect of a bisphosphonate on skeletal complications is given for illustration.
236

A Study of Recidivism Prediction Models for Women Drug Prisoners

Yang, Chin-liang 13 August 2012 (has links)
The paper constructs recidivism prediction models for women drug prisoners, using the 10 factors evaluated in "drug recidivism risk assessment form" by correctional institutions and the 18 factors studied in the literature. With the new recidivism prediction model, I hope to help improving the prediction accuracy of women drug prisoners¡¦ recidivism. The sample in the paper includes 1,029 drug prisoners released from Kaohsiung Women's Prison between 2008 and 2011. All criminal records are traced until the end of 2011. Two sets of potential risk factors of recidivism are considered in the paper. The first set only contains the factors in the evaluation form, and the second set includes all relevant factors. Using Logistic Regression Analysis and Survival Analysis, the effects of potential risk factors on recidivism are examined. I also predict the probability and the time interval of recidivism. Using the Logistic regression model with the risk factors only in the evaluation form, 58.4% of recidivism can be correctly predicted. While extending the set of potential risk factors, the screening rate of recidivism can be enhanced to 73.3%. The median forecast results are far superior to the average forecast in Survival Analysis. With the potential risk factors in the evaluation form, the difference of predicted recidivism date and the actual date is less than 60 days and less than 180 days in 2.5% and 9.6% of sample respectively. With all relevant risk factors, prediction, the share of sample whose difference of predicted recidivism date and the actual date is less than 60 days and less than 180 days are significantly improved to 10.2% and 27.3% respectively.
237

Determinants Of Infant Mortality In Turkey

Seckin, Nutiye 01 October 2009 (has links) (PDF)
Infant mortality rate is used as an indicator of a nation&rsquo / s economic welfare. Despite the tremendous reduction since 1900s infant mortality rate is still high for developing countries. Infant mortality is reduced from 67 to 21 per 1000 live births in 17 years from 1990 to 2007 in Turkey. However, IMR in Turkey is still much higher than the rates in developing countries which is reported as 5 in 2007. In this thesis, I examine regional, household and individual level characteristics that are associated with infant mortality. For this purpose survival analysis is used in this analysis. The data come from 2003-2004 Turkey Demographic and Health Survey that includes detailed information of 8,075 ever married women between the ages 15-49. 7,360 mothers of these women gave birth to 22,443 children. The results of the logistic regression show that intervals between the births of the infants are associated with infant mortality at lower levels of wealth index. Children from poorer families with preceding birth interval shorter than 14 months or children whose mothers experience a subsequent birth fare badly. Breastfeeding is important for the survival chance of the infants under the age 3 months. Place of delivery and source of water the family uses are also found to be correlated with infant mortality risk. Curvilinear relation between maternal age at birth and infant mortality risk is observed, indicating higher risk for teenage mothers and mothers having children at older ages.
238

Autologous Stem Cell Transplantation in Elderly Patients with Non-Hodgkin's Lymphoma

Green, Joel Robert 23 November 2009 (has links)
Clinical trials investigating autologous stem cell transplantation (ASCT) have historically excluded elderly patients due to the risk of treatment-related morbidity related to the administration of high dose chemotherapy. While the availability of this procedure continues to expand, the elderly still represent a population for which the role of ASCT needs to be fully defined. 201 patients who underwent autologous stem cell transplantation (ASCT) for Non Hodgkins lymphoma (NHL) at a single institution following BEAM conditioning between January 1, 2000 and December 31, 2007 were retrospectively identified from the Yale University School of Medicine Bone Marrow Transplant Database. 67 patients were older than 60 years at the time of transplantation (median age 65, range 60 75) and were compared to a matched group of 134 patients transplanted during the same time period. These groups were extremely well-matched for all demographics such as gender, NHL histology, performance status, and comorbidities. Most patients had advanced stage disease at diagnosis and were transplanted at first or second remission. Diffuse large B-cell and mantle cell lymphoma were the most common subtypes but other subtypes were represented. The elderly group experienced significantly more serious toxicities within the first 100 days (63%) when compared to the control group (42%). However, there were no statistical differences (p<0.0001) between the groups regarding specific organ system toxicities. The 1-year non-relapse mortality (3%) was not significantly different when compared to the younger cohort (1%). At a median follow-up of 31 months the median overall survival is 85 months in the elderly group and at a median follow up of 33 months in the younger group the median overall survival has not yet been reached. The overall survival at 3 years is 74% and 75% respectively (p=0.91). The disease-free survival at 3 years is 48% in the elderly group compared to 58% in the control group (p=0.66). By univariate analysis, age >60 years (RR 3.1, 95% CI 1.7 5.7, p=0.004) was the only factor predictive of developing a serious toxicity from ASCT within the first 100 days. HCT-CI score (RR 2, 95% CI 1 4, p=0.043) was the only factor associated with significantly worse overall survival. Autologous stem cell transplantation can be safely performed in selected patients older than 60 years with chemosensitive NHL. Although elderly patients appear more likely to develop acute toxicities, the outcomes are similar to that of younger patients with respect to non-relapse mortality, disease-free survival, and overall survival.
239

Enhancing Gene Expression Signatures in Cancer Prediction Models: Understanding and Managing Classification Complexity

Kamath, Vidya P. 29 July 2010 (has links)
Cancer can develop through a series of genetic events in combination with external influential factors that alter the progression of the disease. Gene expression studies are designed to provide an enhanced understanding of the progression of cancer and to develop clinically relevant biomarkers of disease, prognosis and response to treatment. One of the main aims of microarray gene expression analyses is to develop signatures that are highly predictive of specific biological states, such as the molecular stage of cancer. This dissertation analyzes the classification complexity inherent in gene expression studies, proposing both techniques for measuring complexity and algorithms for reducing this complexity. Classifier algorithms that generate predictive signatures of cancer models must generalize to independent datasets for successful translation to clinical practice. The predictive performance of classifier models is shown to be dependent on the inherent complexity of the gene expression data. Three specific quantitative measures of classification complexity are proposed and one measure ( f) is shown to correlate highly (R 2=0.82) with classifier accuracy in experimental data. Three quantization methods are proposed to enhance contrast in gene expression data and reduce classification complexity. The accuracy for cancer prognosis prediction is shown to improve using quantization in two datasets studied: from 67% to 90% in lung cancer and from 56% to 68% in colorectal cancer. A corresponding reduction in classification complexity is also observed. A random subspace based multivariable feature selection approach using costsensitive analysis is proposed to model the underlying heterogeneous cancer biology and address complexity due to multiple molecular pathways and unbalanced distribution of samples into classes. The technique is shown to be more accurate than the univariate ttest method. The classifier accuracy improves from 56% to 68% for colorectal cancer prognosis prediction.  A published gene expression signature to predict radiosensitivity of tumor cells is augmented with clinical indicators to enhance modeling of the data and represent the underlying biology more closely. Statistical tests and experiments indicate that the improvement in the model fit is a result of modeling the underlying biology rather than statistical over-fitting of the data, thereby accommodating classification complexity through the use of additional variables.
240

A Monte Carlo Approach to Change Point Detection in a Liver Transplant

Makris, Alexia Melissa 01 January 2013 (has links)
Patient survival post liver transplant (LT) is important to both the patient and the center's accreditation, but over the years physicians have noticed that distant patients struggle with post LT care. I hypothesized that patient's distance from the transplant center had a detrimental effect on post LT survival. I suspected Hepatitis C (HCV) and Hepatocellular Carcinoma (HCC) patients would deteriorate due to their recurrent disease and there is a need for close monitoring post LT. From the current literature it was not clear if patients' distance from a transplant center affects outcomes post LT. Firozvi et al. (Firozvi AA, 2008) reported no difference in outcomes of LT recipients living 3 hours away or less. This study aimed to examine outcomes of LT recipients based on distance from a transplant center. I hypothesized that the effect of distance from a LT center was detrimental after adjusting for HCV and HCC status. Methods: This was a retrospective single center study of LT recipients transplanted between 1996 and 2012. 821 LT recipients were identified who qualified for inclusion in the study. Survival analysis was performed using standard methods as well as a newly developed Monte Carlo (MC) approach for change point detection. My new methodology, allowed for detection of both a change point in distance and a time by maximizing the two parameter score function (M2p) over a two dimensional grid of distance and time values. Extensive simulations using both standard distributions and data resembling the LT data structure were used to prove the functionality of the model. Results: Five year survival was 0.736 with a standard error of 0.018. Using Cox PH it was demonstrated that patients living beyond 180 miles had a hazard ratio (HR) of 2.68 (p-value<0.004) compared to those within 180 miles from the transplant center. I was able to confirm these results using KM and HCV/HCC adjusted AFT, while HCV and HCC adjusted LR confirmed the distance effect at 180 miles (p=0.0246), one year post LT. The new statistic that has been labeled M2p allows for simultaneous dichotomization of distance in conjunction with the identification of a change point in the hazard function. It performed much better than the previously available statistics in the standard simulations. The best model for the data was found to be extension 3 which dichotomizes the distance Z, replacing it by I(Z>c), and then estimates the change point c and tau. Conclusions: Distance had a detrimental effect and this effect was observed at 180 miles from the transplant center. Patients living beyond 180 miles from the transplant center had 2.68 times the death rate compared to those living within the 180 mile radius. Recipients with HCV fared the worst with the distance effect being more pronounced (HR of 3.72 vs. 2.68). Extensive simulations using different parameter values in both standard simulations and simulations resembling LT data, proved that these new approaches work for dichotomizing a continuous variable and finding a point beyond which there is an incremental effect from this variable. The recovered values were very close to the true values and p-values were small.

Page generated in 0.0783 seconds