[Abstract]Microarray data classification is used primarily to predict unseen data using a model built on categorized existing Microarray data. One of the major challenges is that Microarray data contains a large number of genes with a small number of samples. This high dimensionality problem has prevented many existing classification methods from directly dealing with this type of data. Moreover, the small number of samples increases the overfitting problem of Classification, as a result leading to lower accuracy classification performance. Another major challenge is that of the uncertainty of Microarraydata quality. Microarray data contains various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as the high dimensionality problem. Most current classification methods are not robust enough to handle these type of data properly.In our research, accuracy and noise resistance or robustness issues are focused on. Our approach is to design a robust classification method for Microarray data classification.An algorithm, called diversified multiple decision trees (DMDT) is proposed, which makes use of a set of unique trees in the decision committee. The DMDT method has increased the diversity of ensemble committees andtherefore the accuracy performance has been enhanced by avoiding overlapping genes among alternative trees.Some strategies to eliminate noisy data have been looked at. Our method ensures no overlapping genes among alternative trees in an ensemble committee, so a noise gene included in the ensemble committee can affect onetree only; other trees in the committee are not affected at all. This design increases the robustness of Microarray classification in terms of resistance to noise data, and therefore reduces the instability caused by overlapping genes in current ensemble methods.The effectiveness of gene selection methods for improving the performance of Microarray classification methods are also discussed.We conclude that the proposed method DMDT substantially outperforms the other well-known ensemble methods, such as Bagging, Boosting and Random Forests, in terms of accuracy and robustness performance. DMDT is more tolerant to noise than Cascading-and-Sharing trees (CS4), particularywith increasing levels of noise in the data. The results also indicate that some classification methods are insensitive to gene selection while some methodsdepend on particular gene selection methods to improve their performance of classification.
Identifer | oai:union.ndltd.org:ADTP/256980 |
Date | January 2008 |
Creators | Hu, Hong |
Publisher | University of Southern Queensland, Faculty of Sciences |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | http://www.usq.edu.au/eprints/terms_conditions.htm, (c) Copyright 2008 Hong Hu |
Page generated in 0.0012 seconds