Genome-wide microarray technology has facilitated the systematic discovery of diagnostic biomarkers of cancers and other pathologies. However, meta-analyses of published arrays using melanoma as a test cancer has uncovered significant inconsistences that hinder advances in clinical practice. In this study a computational model for the integrated analysis of microarray datasets is proposed in order to provide a robust ranking of genes in terms of their relative significance; both genome-wide relative significance (GWRS) and genome-wide global significance (GWGS).
When applied to five melanoma microarray datasets published between 2000 and 2011, a new 12-gene diagnostic biomarker signature for melanoma was defined (i.e., EGFR, FGFR2, FGFR3, IL8, PTPRF, TNC, CXCL13, COL11A1, CHP2, SHC4, PPP2R2C, and WNT4). Of these, CXCL13, COL11A1, PTPRF and SHC4 are components of the MAPK pathway and were validated by immunocyto- and immunohisto-chemistry. These proteins were found to be overexpressed in metastatic and primary melanoma cells in vitro and in melanoma tissue in situ compared to melanocytes cultured from healthy skin epidermis and normal healthy human skin.
One challenge for the integrated analysis of microarray data is that the microarray data are produced using different platforms and bio-samples, e.g. including both cell line- and biopsy-based microarray datasets. In order to address these challenges, the computational model was further enhanced the stratification of datasets into either biopsy or cell line derived datasets, and via the weighting of microarray data based on quality criteria of data. The methods enhancement was applied to 14 microarray datasets of three cancers (breast, prostate, and melanoma) based on classification accuracy and on the capability to identify predictive biomarkers. Four novel measures for evaluating the capability to identify predictive biomarkers are proposed: (1) classifying independent testing data using wrapper feature selection with machine leaning, (2) assessing the number of common genes with the genes retrieved in independent testing data, (3) assessing the number of common genes with the genes retrieved in across multiple training datasets, (4) assessing the number of common genes with the genes validated in the literature.
This enhancement of computational approach (i) achieved reliable classification performance across multiple datasets, (ii) recognized more significant genes into the top-ranked genes as compared to the genes detected by the independent test data, and (iii) detected more meaningful genes than were validated in previous melanoma studies in the literature.
Identifer | oai:union.ndltd.org:BRADFORD/oai:bradscholars.brad.ac.uk:10454/7346 |
Date | January 2014 |
Creators | Liu, Wanting |
Contributors | Peng, Yonghong, Tobin, Desmond J. |
Publisher | University of Bradford, School of Engineering and Informatics |
Source Sets | Bradford Scholars |
Language | English |
Detected Language | English |
Type | Thesis, doctoral, PhD |
Rights | <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>. |
Page generated in 0.0018 seconds