1 |
Agrupamento de dados fuzzy colaborativo / Collaborative fuzzy clusteringColetta, Luiz Fernando Sommaggio 19 May 2011 (has links)
Nas últimas décadas, as técnicas de mineração de dados têm desempenhado um importante papel em diversas áreas do conhecimento humano. Mais recentemente, essas ferramentas têm encontrado espaço em um novo e complexo domínio, nbo qual os dados a serem minerados estão fisicamente distribuídos. Nesse domínio, alguns algorithmos específicos para agrupamento de dados podem ser utilizados - em particular, algumas variantes do algoritmo amplamente Fuzzy C-Means (FCM), as quais têm sido investigadas sob o nome de agrupamento fuzzy colaborativo. Com o objetivo de superar algumas das limitações encontradas em dois desses algoritmos, cinco novos algoritmos foram desenvolvidos nesse trabalho. Esses algoritmos foram estudados em dois cenários específicos de aplicação que levam em conta duas suposições sobre os dados (i.e., se os dados são de uma mesma npopulação ou de diferentes populações). Na prática, tais suposições e a dificuldade em se definir alguns dos parâmetros (que possam ser requeridos), podemn orientar a escolha feita pelo usuário entre os algoitmos diponíveis. Nesse sentido, exemplos ilustrativos destacam as diferenças de desempenho entre os algoritmos estudados e desenvolvidos, permitindo derivar algumas conclusões que podem ser úteis ao aplicar agrupamento fuzzy colaborativo na prática. Análises de complexidade de tempo, espaço, e comunicação também foram realizadas / Data mining techniques have played in important role in several areas of human kwnowledge. More recently, these techniques have found space in a new and complex setting in which the data to be mined are physically distributed. In this setting algorithms for data clustering can be used, such as some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data ditributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustring. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set systematically structured guidelines as to a selection of the specific algorithm dependeing upon a nature of the data environment and the assumption being made about the number of clusters. A thorough complexity analysis including space, time, and communication aspects is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study
|
2 |
Agrupamento de dados fuzzy colaborativo / Collaborative fuzzy clusteringLuiz Fernando Sommaggio Coletta 19 May 2011 (has links)
Nas últimas décadas, as técnicas de mineração de dados têm desempenhado um importante papel em diversas áreas do conhecimento humano. Mais recentemente, essas ferramentas têm encontrado espaço em um novo e complexo domínio, nbo qual os dados a serem minerados estão fisicamente distribuídos. Nesse domínio, alguns algorithmos específicos para agrupamento de dados podem ser utilizados - em particular, algumas variantes do algoritmo amplamente Fuzzy C-Means (FCM), as quais têm sido investigadas sob o nome de agrupamento fuzzy colaborativo. Com o objetivo de superar algumas das limitações encontradas em dois desses algoritmos, cinco novos algoritmos foram desenvolvidos nesse trabalho. Esses algoritmos foram estudados em dois cenários específicos de aplicação que levam em conta duas suposições sobre os dados (i.e., se os dados são de uma mesma npopulação ou de diferentes populações). Na prática, tais suposições e a dificuldade em se definir alguns dos parâmetros (que possam ser requeridos), podemn orientar a escolha feita pelo usuário entre os algoitmos diponíveis. Nesse sentido, exemplos ilustrativos destacam as diferenças de desempenho entre os algoritmos estudados e desenvolvidos, permitindo derivar algumas conclusões que podem ser úteis ao aplicar agrupamento fuzzy colaborativo na prática. Análises de complexidade de tempo, espaço, e comunicação também foram realizadas / Data mining techniques have played in important role in several areas of human kwnowledge. More recently, these techniques have found space in a new and complex setting in which the data to be mined are physically distributed. In this setting algorithms for data clustering can be used, such as some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data ditributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustring. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set systematically structured guidelines as to a selection of the specific algorithm dependeing upon a nature of the data environment and the assumption being made about the number of clusters. A thorough complexity analysis including space, time, and communication aspects is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study
|
3 |
Projection separability: A new approach to evaluate embedding algorithms in the geometrical spaceAcevedo Toledo, Aldo Marcelino 06 February 2024 (has links)
Evaluating separability is fundamental to pattern recognition. A plethora of embedding methods, such as dimension reduction and network embedding algorithms, have been developed to reveal the emergence of geometrical patterns in a low-dimensional space, where high-dimensional sample and node similarities are approximated by geometrical distances. However, statistical measures to evaluate the separability attained by the embedded representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for evaluating the separability of embedded results. This work introduces a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the separability of data samples in a reduced (i.e., low-dimensional) geometrical space. In a first case study, using this rationale, a new class of indices named projection separability indices (PSIs) is implemented based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coefficient. The PSIs are compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different dimension reduction algorithms. In a second case study, the PS rationale is extended to define and measure the geometric separability (linear and nonlinear) of mesoscale patterns in complex data visualization by solving the traveling salesman problem, offering experimental evidence on the evaluation of community separability of network embedding results using eight real network datasets and three network embedding algorithms. The results of both studies provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing the separability of embedded results in the low-dimensional space but also for fine-tuning embedding algorithms’ hyperparameters. Besides these advantages, the PS rationale can be used to design new statistical-based separability measures other than the ones presented in this work, providing the community with a novel and flexible framework for assessing separability.
|
4 |
Evaluating of Fuzzy Clustering Results / Hodnocení Výsledků Fuzzy ShlukováníŘíhová, Elena January 2013 (has links)
Cluster analysis is a multivariate statistical classification method, implying different methods and procedures. Clustering methods can be divided into hard and fuzzy; the latter one provides a more precise picture of the information by clustering objects than hard clustering. But in practice, the optimal number of clusters is not known a priori, and therefore it is necessary to determine the optimal number of clusters. To solve this problem, the validity indices help us. However, there are many different validity indices to choose from. One of the goals of this work is to create a structured overview of existing validity indices and techniques for evaluating fuzzy clustering results in order to find the optimal number of clusters. The main aim was to propose a new index for evaluating the fuzzy clustering results, especially in cases with a large number of clusters (defined as more than five). The newly designed coefficient is based on the degrees of membership and on the distance (Euclidean distance) between the objects, i.e. based on principles from both fuzzy and hard clustering. The suitability of selected validity indices was applied on real and generated data sets with known optimal number of clusters a priory. These data sets have different sizes, different numbers of variables, and different numbers of clusters. The aim of the current work is regarded as fulfilled. A key contribution of this work was a new coefficient (E), which is appropriate for evaluating situations with both large and small numbers of clusters. Because the new validity index is based on the principles of both fuzzy clustering and hard clustering, it is able to correctly determine the optimal number of clusters on both small and large data sets. A second contribution of this research was a structured overview of existing validity indices and techniques for evaluating the fuzzy clustering results.
|
5 |
The Need for Validity Indices in Personality Assessment: A Demonstration Using the MMPI-2-RFBurchett, Danielle L. 07 July 2009 (has links)
No description available.
|
Page generated in 0.0579 seconds