Return to search

Development of new knowledge discovery tools to explore biomedical datasets in breast cancer

The explorative power of high throughput technologies in cancer research has become well established in recent years, exemplified by diverse gene microarray studies. However, development of the necessary biomedical data analysis tools has historically been confined to a commercial environment, while comprehensive, user-friendly analysis approaches are still needed. Availability of freely-available software, notably the 'R' project statistical programming language, allowed development of a user-friendly multivariate statistics application - Informatics Tenovus (I-10) - in this project. I-10 provides a platform through which powerful existing and future 'R' project statistical analysis methodologies can be applied, without prior programming knowledge. The new system was tested in the context of exploring antihormone resistance in breast cancer, analysing microarray datasets from in vitro models of acquired Tamoxifen (TAMR) or Faslodex resistance (FASR) versus endocrine responsive MCF-7 cells. The analysis not only revealed known de-regulated genes, but also further potential future markers/targets for endocrine response/resistance. The advantages of the 'R' programming environment together with Microsoft Visual Basic.net technology for producing user-friendly biomedical analysis tools facilitated subsequent development of a tool which could explore SEER cancer patient datasets. This new cancer query survival tool - Superstes -allows detailed statistical modelling of the impact that multiple patient attributes (in this instance derived from the SEER breast and colorectal cancer datasets) have on patient survival. The versatility of 'R' was additionally demonstrated in further exploring classifiers, where it was able to interface with the sophisticated, freely available machine learning application 'Weka'. Using 'R' and Weka, breast cancer patient survival was modelled using equivalent patient attributes to the Nottingham Prognostic Index and a 10 year survival subset of the SEER breast cancer dataset. Several machine learning methodologies were compared for their ability to accurately model survival, with their value in routine clinical use for prediction of patient survival then critically evaluated.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:584528
Date January 2009
CreatorsHill, Nathan Stuart
PublisherCardiff University
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://orca.cf.ac.uk/54473/

Page generated in 0.0084 seconds