Return to search

Correlation coefficient based feature screening : With applications to microarray data / Korrelationsbaserad dimensionsreduktion med tillämpning på data från mikromatriser

Measuring dependency between variables is of great importance when performing statistical analysis and can for instance be used for feature screening. Therefore, it is interesting to find measures that can quantify the dependencies, even if the dependencies are complex. Recently, the correlation coefficient ξn was proposed [1], that is fast to compute and works particularly well when dependencies show an oscillatory or wiggly pattern. In this thesis, the coefficient ξn was applied as a feature screening tool, and it was investigated how well the coefficient could find the dependencies between predictor variables and a response variable in a comprehensive simulation study. The result showed that the correlation coefficient ξn was better, compared to two other quite new and popular correlation coefficients, Hilbert-Schmidt Independence Criterion and Distance Correlation (DC), at detecting the dependencies when variables were connected through sinus-or cosinus-functions and worse when variables were connected through some other functions, such as exponential functions. As a feature screening tool, the correlation coefficient ξn and DC was also applied to real microarray data to investigate if it could give better results than when using t-test for feature screening. The result showed that using t-test was more efficient than using DC or ξn for this particular data set.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-196902
Date January 2022
CreatorsHolma, Agnes
PublisherUmeå universitet, Statistik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds