Return to search

Coefficient of intrinsic dependence: a new measure of association

To detect dependence among variables is an essential task in many scientific
investigations. In this study we propose a new measure of association, the coefficient
of intrinsic dependence (CID), which takes value in [0,1] and faithfully reflects the full
range of dependence for two random variables. The CID is free of distributional and
functional assumptions. It can be easily implemented and extended to multivariate
situations.
Traditionally, the correlation coefficient is the preferred measure of association.
However, it's effectiveness is considerably compromised when the random variables
are not normally distributed. Besides, the interpretation of the correlation coefficient
is difficult when the data are categorical. By contrast, the CID is free of these problems.
In our simulation studies, we find that the ability of the CID in differentiating
different levels of dependence remains robust across different data types (categorical
or continuous) and model features (linear or curvilinear). Also, the CID is particularly
effective when the dependence is strong, making it a powerful tool for variable
selection.
As an illustration, the CID is applied to variable selection in two aspects: classification
and prediction. The analysis of actual data from a study of breast cancer gene expression
is included. For the classification problem, we identify a pair of genes that best
classify a patient's prognosis signature, and for the prediction problem, we identify a
pair of genes that best relates to the expression of a specific gene.

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/2397
Date29 August 2005
CreatorsLiu, Li-yu Daisy
ContributorsHsing, Tailen
PublisherTexas A&M University
Source SetsTexas A and M University
Languageen_US
Detected LanguageEnglish
TypeBook, Thesis, Electronic Dissertation, text
Format730977 bytes, electronic, application/pdf, born digital

Page generated in 0.0022 seconds