Evaluating separability is fundamental to pattern recognition. A plethora of embedding methods, such as dimension reduction and network embedding algorithms, have been developed to reveal the emergence of geometrical patterns in a low-dimensional space, where high-dimensional sample and node similarities are approximated by geometrical distances. However, statistical measures to evaluate the separability attained by the embedded representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for evaluating the separability of embedded results. This work introduces a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the separability of data samples in a reduced (i.e., low-dimensional) geometrical space. In a first case study, using this rationale, a new class of indices named projection separability indices (PSIs) is implemented based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coefficient. The PSIs are compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different dimension reduction algorithms. In a second case study, the PS rationale is extended to define and measure the geometric separability (linear and nonlinear) of mesoscale patterns in complex data visualization by solving the traveling salesman problem, offering experimental evidence on the evaluation of community separability of network embedding results using eight real network datasets and three network embedding algorithms. The results of both studies provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing the separability of embedded results in the low-dimensional space but also for fine-tuning embedding algorithms’ hyperparameters. Besides these advantages, the PS rationale can be used to design new statistical-based separability measures other than the ones presented in this work, providing the community with a novel and flexible framework for assessing separability.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:89531 |
Date | 06 February 2024 |
Creators | Acevedo Toledo, Aldo Marcelino |
Contributors | Schroeder, Michael, Cannistraci, Carlo Vittorio, Technische Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | 10.1109/ACCESS.2022.3152789, 10.1109/ACCESS.2022.3152789 |
Page generated in 0.0025 seconds