Spelling suggestions: "subject:"graph clustering"" "subject:"raph clustering""
1 |
Approches modèles pour la structuration du web vu comme un graphe / Model based approaches for uncovering web structuresZanghi, Hugo 25 June 2010 (has links)
L’analyse statistique des réseaux complexes est une tâche difficile, étant donné que des modèles statistiques appropriés et des procédures de calcul efficaces sont nécessaires afin d’apprendre les structures sous-jacentes. Le principe de ces modèles est de supposer que la distribution des valeurs des arêtes suit une distribution paramétrique, conditionnellement à une structure latente qui est utilisée pour détecter les formes de connectivité. Cependant, ces méthodes souffrent de procédures d’estimation relativement lentes, puisque les dépendances sont complexes. Dans cette thèse nous adaptons des stratégies d’estimation incrémentales, développées à l’origine pour l’algorithme EM, aux modèles de graphes. Additionnellement aux données de réseau utilisées dans les méthodes mentionnées ci-dessus, le contenu des noeuds est parfois disponible. Nous proposons ainsi des algorithmes de partitionnement pour les ensembles de données pouvant être modélisés avec une structure de graphe incorporant de l’information au sein des sommets. Finalement,un service Web en ligne, basé sur le moteur de recherche d’ Exalead, permet de promouvoir certains aspects de cette thèse. / He statistical analysis of complex networks is a challenging task, given that appropriate statistical models and efficient computational procedures are required in order for structures to be learned. The principle of these models is to assume that the distribution of the edge values follows a parametric distribution, conditionally on a latent structure which is used to detect connectivity patterns. However, these methods suffer from relatively slow estimation procedures, since dependencies are complex. In this thesis we adapt online estimation strategies, originally developed for the EM algorithm, to the case of graph models. In addition to the network data used in the methods mentioned above, vertex content will sometimes be available. We then propose algorithms for clustering data sets that can be modeled with a graph structure embedding vertex features. Finally, an online Web application, based on the Exalead search engine, allows to promote certain aspects of this thesis.
|
2 |
Optimization Frameworks for Graph ClusteringLuke N Veldt (6636218) 15 May 2019 (has links)
<div>In graph theory and network analysis, communities or clusters are sets of nodes in a graph that share many internal connections with each other, but are only sparsely connected to nodes outside the set. Graph clustering, the computational task of detecting these communities, has been studied extensively due to its widespread applications and its theoretical richness as a mathematical problem. This thesis presents novel optimization tools for addressing two major challenges associated with graph clustering.</div><div></div><div>The first major challenge is that there already exists a plethora of algorithms and objective functions for graph clustering. The relationship between different methods is often unclear, and it can be very difficult to determine in practice which approach is the best to use for a specific application. To address this challenge, we introduce a generalized discrete optimization framework for graph clustering called LambdaCC, which relies on a single tunable parameter. The value of this parameter controls the balance between the internal density and external sparsity of clusters that are formed by optimizing an underlying objective function. LambdaCC unifies the landscape of graph clustering techniques, as a large number of previously developed approaches can be recovered as special cases for a fixed value of the LambdaCC input parameter. </div><div> </div><div>The second major challenge of graph clustering is the computational intractability of detecting the best way to cluster a graph with respect to a given NP-hard objective function. To address this intractability, we present new optimization tools and results which apply to LambdaCC as well as a broader class of graph clustering problems. In particular, we develop polynomial time approximation algorithms for LambdaCC and other more generalized clustering objectives. In particular, we show how to obtain a polynomial-time 2-approximation for cluster deletion, which improves upon the previous best approximation factor of 3. We also present a new optimization framework for solving convex relaxations of NP-hard graph clustering problems, which are frequently used in the design of approximation algorithms. Finally, we develop a new framework for efficiently setting tunable parameters for graph clustering objective functions, so that practitioners can work with graph clustering techniques that are especially well suited to their application. </div>
|
3 |
Graph clustering as a method to investigate riboswitch variation:Crum, Matthew January 2021 (has links)
Thesis advisor: Michelle M. Meyer / Non-coding RNA (ncRNA) perform vital functions in cells, but the impact of diversity across structure and function of homologous motifs has yet to be fully investigated. One reason for this is that the standard phylogenetic analysis used to address these questions in proteins cannot easily be applied to ncRNA due to their inherent characteristics. Compared to proteins, ncRNA have shorter sequence lengths, lower sequence conservation, and secondary structures that need to be incorporated into the analysis. This has necessitated an effort to develop methodology for investigating the evolutionary and functional relationship between sets of ncRNA. In this pursuit, I studied closely related riboswitches. Riboswitches are structured ncRNA found in bacterial mRNA that regulate gene expressions using their two major components: the aptamer and the expression platform. The aptamer of a riboswitch is able to bind a specific small molecule (ligand), and the bound/unbound state of the aptamer influences conformational changes in the expressions platform that can lead to increased or decreased downstream gene expression. Utilizing sequence and structural similarity metrics combined with graph clustering and de novo community detection algorithms I have determined a methodology for investigating the functional and evolutionary relationship between closely related riboswitches, and other ncRNA by extension, that are found across a range of diverse phyla. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology.
|
4 |
Efficient Detection of Overlapping Communities in Large GraphsMillson, Richard 19 January 2022 (has links)
This thesis proposes an algorithm for the efficient detection of overlapping communities in large graphs. Only super-fast local algorithms like Louvain are really practical for very large datasets, but they tend to give hierarchical rather than overlapping partitions. We develop some techniques that let you get reasonable families of overlapping partitions while preserving most of the good properties of Louvain. We build off an advance in the efficient detection of separated communities, the multilevel Louvain method, and draw inspiration from the Wang-Landau efficiency improvement to Markov chain Monte Carlo sampling. Partitions are iteratively proposed by Louvain, with the internal edges of the best parts downweighted after each step. This suppresses the dominant parts in subsequent partitions, allowing alternative parts to appear. The result is an ensemble of parts describing the overlapping structure of the network.
|
5 |
Spiking Neural Networks: Neuron Models, Plasticity, and Graph ApplicationsDonachy, Shaun 01 January 2015 (has links)
Networks of spiking neurons can be used not only for brain modeling but also to solve graph problems. With the use of a computationally efficient Izhikevich neuron model combined with plasticity rules, the networks possess self-organizing characteristics. Two different time-based synaptic plasticity rules are used to adjust weights among nodes in a graph resulting in solutions to graph prob- lems such as finding the shortest path and clustering.
|
6 |
Range-Based Graph ClusteringLuo, Yongfeng 11 March 2002 (has links)
No description available.
|
7 |
Scalable Clustering of Modern NetworksSatuluri, Venu M. 20 June 2012 (has links)
No description available.
|
8 |
Graph Mining Algorithms for Memory Leak Diagnosis and Biological Database ClusteringMaxwell, Evan Kyle 29 July 2010 (has links)
Large graph-based datasets are common to many applications because of the additional structure provided to data by graphs. Patterns extracted from graphs must adhere to these structural properties, making them a more complex class of patterns to identify. The role of graph mining is to efficiently extract these patterns and quantify their significance. In this thesis, we focus on two application domains and demonstrate the design of graph mining algorithms in these domains.
First, we investigate the use of graph grammar mining as a tool for diagnosing potential memory leaks from Java heap dumps. Memory leaks occur when memory that is no longer in use fails to be reclaimed, resulting in significant slowdowns, exhaustion of available storage, and eventually application crashes. Analyzing the heap dump of a program is a common strategy used in memory leak diagnosis, but our work is the first to employ a graph mining approach to the problem. Memory leaks accumulate in the heap as classes of subgraphs and the allocation paths from which they emanate can be explored to contextualize the leak source. We show that it suffices to mine the dominator tree of the heap dump, which is significantly smaller than the underlying graph. We demonstrate several synthetic as well as real-world examples of heap dumps for which our approach provides more insight into the problem than state-of-the-art tools such as Eclipse's MAT.
Second, we study the problem of multipartite graph clustering as an approach to database summarization on an integrated biological database. Construction of such databases has become a common theme in biological research, where heterogeneous data is consolidated into a single, centralized repository that provides a structured forum for data analysis. We present an efficient approximation algorithm for identifying clusters that form multipartite cliques spanning multiple database tables. We show that our algorithm computes a lossless compression of the database by summarizing it into a reduced set of biologically meaningful clusters. Our algorithm is applied to data from C. elegans, but we note its applicability to general relational databases. / Master of Science
|
9 |
Detecção automática de massas em imagens mamográficas usando particle swarm optimization (PSO) e índice de diversidade funcionalSilva Neto, Otilio Paulo da 04 March 2016 (has links)
Made available in DSpace on 2016-08-17T14:52:40Z (GMT). No. of bitstreams: 1
Dissertacao-OtilioPauloSilva.pdf: 2236988 bytes, checksum: e67439b623fd83b01f7bcce0020365fb (MD5)
Previous issue date: 2016-03-04 / Breast cancer is now set on the world stage as the most common among women and the second biggest killer. It is known that diagnosed early, the chance of cure is quite significant, on the other hand, almost late discovery leads to death. Mammography is the most common test that allows early detection of cancer, this procedure can show injury in the early stages also contribute to the discovery and diagnosis of breast lesions. Systems computer aided, have been shown to be very important tools in aid to specialists in diagnosing injuries. This paper proposes a computational methodology to assist in the discovery of mass in dense and nondense breasts. This paper proposes a computational methodology to assist in the discovery of mass in dense and non-dense breasts. Divided into 6 stages, this methodology begins with the acquisition of the acquired breast image Digital Database for Screening Mammography (DDSM). Then the second phase is done preprocessing to eliminate and enhance the image structures. In the third phase is executed targeting with the Particle Swarm Optimization (PSO) to find regions of interest (ROIs) candidates for mass. The fourth stage is reduction of false positives, which is divided into two parts, reduction by distance and clustering graph, both with the aim of removing unwanted ROIs. In the fifth stage are extracted texture features using the functional diversity indicia (FD). Finally, in the sixth phase, the classifier uses support vector machine (SVM) to validate the proposed methodology. The best values found for non-dense breasts, resulted in sensitivity of 96.13%, specificity of 91.17%, accuracy of 93.52%, the taxe of false positives per image 0.64 and acurva free-response receiver operating characteristic (FROC) with 0.98. The best finds for dense breasts hurt with the sensitivity of 97.52%, specificity of 92.28%, accuracy of 94.82% a false positive rate of 0.38 per image and FROC curve 0.99. The best finds with all the dense and non dense breasts Showed 95.36% sensitivity, 89.00% specificity, 92.00% accuracy, 0.75 the rate of false positives per image and 0, 98 FROC curve. / O câncer de mama hoje é configurado no senário mundial como o mais comum entre as mulheres e o segundo que mais mata. Sabe-se que diagnosticado precocemente, a chance de cura é bem significativa, por outro lado, a descoberta tardia praticamente leva a morte. A mamografia é o exame mais comum que permite a descoberta precoce do câncer, esse procedimento consegue mostrar lesões nas fases iniciais, além de contribuir para a descoberta e o diagnóstico de lesões na mama. Sistemas auxiliados por computador, têm-se mostrado ferramentas importantíssimas, no auxilio a especialistas em diagnosticar lesões. Este trabalho propõe uma metodologia computacional para auxiliar na descoberta de massas em mamas densas e não densas. Dividida em 6 fases, esta metodologia se inicia com a aquisição da imagem da mama adquirida da Digital Database for Screening Mammography (DDSM). Em seguida, na segunda fase é feito o pré-processamento para eliminar e realçar as estruturas da imagem. Na terceira fase executa-se a segmentação com o Particle Swarm Optimization (PSO) para encontrar as regiões de interesse (ROIs) candidatas a massa. A quarta fase é a redução de falsos positivos, que se subdivide em duas partes, sendo a redução pela distância e o graph clustering, ambos com o objetivo de remover ROIs indesejadas. Na quinta fase são extraídas as características de textura utilizando os índices de diversidade funcional (FD). Por fim, na sexta fase, utiliza-se o classificador máquina de vetores de suporte (SVM) para validar a metodologia proposta. Os melhores valores achados para as mamas não densas, resultaram na sensibilidade de 96,13%, especificidade de 91,17%, acurácia de 93,52%, a taxe de falsos positivos por imagem de 0,64 e a acurva Free-response Receiver Operating Characteristic (FROC) com 0,98. Os melhores achados para as mamas densas firam com a sensibilidade de 97,52%, especificidade de 92,28%, acurácia de 94,82%, uma taxa de falsos positivos por imagem de 0,38 e a curva FROC de 0,99. Os melhores achados com todas as mamas densas e não densas, apresentaram 95,36% de sensibilidade, 89,00% de especificidade, 92,00% de acurácia, 0,75 a taxa de falsos positivos por imagem e 0,98 a curva FROC.
|
10 |
A Polymorphic Ant-Based Algorithm for Graph ClusteringLiu, Ying Ying, Liu, Ying Ying 12 April 2016 (has links)
In this thesis, I introduce two new algorithms: Ant Brood Clustering-Intelligent Ants (ABC-INTE) and Ant Brood Clustering-Polymorphic Ants (ABC-POLY) for the graph clustering problem. ABC-INTE uses techniques such as hopping ants, relaxed drop function, ants with memories, stagnation control, and addition of k-means cluster retrieval process, as an improvement of the basic ABC-KLS algorithm. ABC-POLY uses two types of ants, inspired by the division of labour between the major and minor ants in Pheidole genus, as an improvement of ABC-INTE. For comparison purpose, I also implement MMAS, an ACO clustering algorithm. When tested on the benchmark networks, ABC-POLY outperforms or achieves the same modularity values as MMAS and ABC-INTE on 7 out of 10 networks and is robust against different graphs. In practice, the speed of ABC-POLY is at least 10 times faster than MMAS, making it a scalable algorithm compared to MMAS. ABC-POLY also outputs a direct visual representation of the natural clusters on the graph that is appealing to human observation. This thesis opens an interesting research topic to apply polymorphic ants for graph clustering in the ABC-POLY algorithm. The distributive and self-organization nature of ABC-POLY makes it a candidate for analyzing clusters in more complex and dynamic graphs. / May 2016
|
Page generated in 0.34 seconds