Return to search

Gaussian process regression models for the analysis of survival data with competing risks, interval censoring and high dimensionality

We develop novel statistical methods for analysing biomedical survival data based on Gaussian process (GP) regression. GP regression provides a powerful non-parametric probabilistic method of relating inputs to outputs. We apply this to survival data which consist of time-to-event and covariate measurements. In the context of GP regression the covariates are regarded as `inputs' and the event times are the `outputs'. This allows for highly exible inference of non-linear relationships between covariates and event times. Many existing methods for analysing survival data, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models and as such our focus is on directly characterising the relationship between the covariates and event times without any explicit assumptions on what form the hazard rates take. This provides a more direct route to connecting the covariates to survival outcomes with minimal assumptions. An application of our model to experimental data illustrates its usefulness. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. In this case the multiple outputs correspond to the time-to-event for each risk. By tuning one of the model parameters we can control the extent to which the multiple outputs are dependent thus allowing the specication of correlated risks. However, the identiability problem, which states that it is not possible to infer whether risks are truly independent or otherwise on the basis of observed data, still holds. In spite of this fundamental limitation simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions. The second part of this thesis is concerned with high dimensional survival data where there are a large number of covariates compared to relatively few individuals. This leads to the problem of overtting, where spurious relationships are inferred from the data. One strategy to tackle this problem is dimensionality reduction. The Gaussian process latent variable model (GPLVM) is a powerful method of extracting a low dimensional representation of high dimensional data. We extend the GPLVM to incorporate survival outcomes by combining the model with a Weibull proportional hazards model (WPHM). By reducing the ratio of covariates to samples we hope to diminish the eects of overtting. The combined GPLVM-WPHM model can also be used to combine several datasets by simultaneously expressing them in terms of the same low dimensional latent variables. We construct the Laplace approximation of the marginal likelihood and use this to determine the optimal number of latent variables, thereby allowing detection of intrinsic low dimensional structure. Results from both simulated and real data show a reduction in overtting and an increase in predictive accuracy after dimensionality reduction.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:677163
Date January 2015
CreatorsBarrett, James Edward
ContributorsKuehn, Reimer ; Coolen, Anthonius Clara Caspar
PublisherKing's College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://kclpure.kcl.ac.uk/portal/en/theses/gaussian-process-regression-models-for-the-analysis-of-survival-data-with-competing-risks-interval-censoring-and-high-dimensionality(fe3440e1-9766-4fc3-9d23-fe4af89483b5).html

Page generated in 0.0024 seconds