Comparative Analysis of Thresholding Algorithms for Microarray-derived Gene Correlation Matrices

The thresholding problem is important in today’s data-rich research scenario. A threshold is a well-defined point in the data distribution beyond which the data is highly likely to have scientific meaning. The selection of threshold is crucial since it heavily influences any downstream analysis and inferences made there from. A legitimate threshold is one that is not arbitrary but scientifically well grounded, data-dependent and best segregates the information-rich and noisy sections of data. Although the thresholding problem is not restricted to any particular field of study, little research has been done. This study investigates the problem in context of network-based analysis of transcriptomic data. Six conceptually diverse algorithms – based on number of maximal cliques, correlations of control spots with genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values and statistical power – are used to threshold the gene correlation matrices of three time-series microarray datasets and tested for stability and validity. Stability or reliability of the first four algorithms towards thresholding is tested upon block bootstrapping of arrays in the datasets and comparing the estimated thresholds against the bootstrap threshold distributions. Validity of thresholding algorithms is tested by comparison of the estimated thresholds against threshold based on biological information. Thresholds based on the modular basis of gene networks are concluded to perform better both in terms of stability as well as validity. Future challenges to research the problem have been identified. Although the study utilizes transcriptomic data for analysis, we assert its applicability to thresholding across various fields.

Identiferoai:union.ndltd.org:UTENN/oai:trace.tennessee.edu:utk_gradthes-1399
Date01 August 2008
CreatorsBorate, Bhavesh Ram
PublisherTrace: Tennessee Research and Creative Exchange
Source SetsUniversity of Tennessee Libraries
Detected LanguageEnglish
Typetext
SourceMasters Theses

Page generated in 0.0021 seconds