Return to search

Measures of statistical dependence for feature selection : Computational study / Mått på statistiskt beroende för funktionsval : Beräkningsstudie

The importance of feature selection for statistical and machine learning models derives from their explainability and the ability to explore new relationships, leading to new discoveries. Straightforward feature selection methods measure the dependencies between the potential features and the response variable. This thesis tries to study the selection of features according to a maximal statistical dependency criterion based ongeneralized Pearson’s correlation coefficients, e.g., Wijayatunga’s coefficient. I present a framework for feature selection based on these coefficients for high dimensional feature variables. The results are compared to the ones obtained by applying an elastic net regression (for high-dimensional data). The generalized Pearson’s correlation coefficient is a metric-based measure where the metric is Hellinger distance. The metric is considered as the distance between probability distributions. The Wijayatunga’s coefficient is originally proposed for the discrete case; here, we generalize it for continuous variables by discretization and kernelization. It is interesting to see how discretization work as we discretize the bins finer. The study employs both synthetic and real-world data to illustrate the validity and power of this feature selection process. Moreover, a new method of normalization for mutual information is included. The results show that both measures have considerable potential in detecting associations. The feature selection experiment shows that elastic net regression is superior to our proposed method; nevertheless, more investigation could be done regarding this subject.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-196895
Date January 2022
CreatorsAlshalabi, Mohamad
PublisherUmeå universitet, Statistik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds