1 |
Detection and dynamic of local communities in large social networks / Détection et dynamique des communautés locales dans les grands réseaux sociauxNgonmang Kaledje, Christel Blaise 27 November 2014 (has links)
Les réseaux sont présents dans plusieurs contextes et applications : biologie, transports, réseaux sociaux en ligne, etc. De nombreuses applications récentes traitent d'immenses volumes de données personnelles. Les liens entre les personnes dans ces données peuvent traduire des liens d'amitiés, des échanges de messages, ou des intérêts communs. Les entités impliquées dans les réseaux, et spécialement les personnes, ont tendance à former des communautés. Dans ce contexte, une communauté peut être définie comme un ensemble d'entités qui interagissent beaucoup plus entre elles qu'avec le reste du réseau. La détection de communautés dans les grands réseaux a largement été étudiée pendant ces dernières années, suite aux travaux précurseurs de Newman qui a introduit le critère de modularité. Toutefois, la majorité des algorithmes de détection de communautés supposent que le réseau est complètement connu et qu'il n'évolue pas avec le temps. Dans cette thèse, nous commençons par proposer de nouvelles méthodes pour la détection de communautés locales (en considérant uniquement le voisinage d'un nœud donné et sans accéder à la totalité du réseau). Nos algorithmes sont plus efficaces que ceux de l'état de l'art. Nous montrons ensuite comment utiliser les communautés détectées pour améliorer la prévision de comportements utilisateurs. Dans un deuxième temps, nous proposons des approches pour prévoir l'évolution des communautés détectées. Ces méthodes sont basées sur des techniques d'apprentissage automatique. Enfin, nous proposons un framework général pour stocker et analyser les réseaux distribués dans un environnement "Big Data" . Les méthodes proposées sont validées en utilisant (entre autre) des données réelles issues d'un partenaire industriel fournissant un des réseaux en ligne les plus utilisés en France (40 millions d'utilisateurs). / Complex networks arises in many contexts and applications : biology, transports, online social networks (ONS). Many recent applications deal with large amount of personal data. The links between peoples may reflect freindship, messaging, or some common interests. Entities in complex network, and espacially persons, tend to form communities. Here, a community can be defined as a set of entities interacting more between each other than with the rest of the network. The topic of community detection in large networks as been extensively studied during the last decades, following the seminal work by newman, who popularized the modularity criteria. However, most community detection algorithms assume that the network is entirely known and that is does not evolve with time. This is usually not true in real world applications. In this thesis, we start by proposing novel methods for local community identification (considering only the vicinity of a given node, without accessing the whole graph). Our algorithms experimentally outperform the state-of-art methods. We show how to use the local communities to enhance the prediction of a user's behaviour. Secondly, we propose some approaches to predict the evolution of the detected communities based on machine learning methods. Finally we propose a framework for storing and processing distributed social networks in a Big Data environment. The proposed methods are validated using (among others) real world data, provided by a industrial partner operating a major social network platform in France (40 millions of users).
|
2 |
Efficient Detection of Overlapping Communities in Large GraphsMillson, Richard 19 January 2022 (has links)
This thesis proposes an algorithm for the efficient detection of overlapping communities in large graphs. Only super-fast local algorithms like Louvain are really practical for very large datasets, but they tend to give hierarchical rather than overlapping partitions. We develop some techniques that let you get reasonable families of overlapping partitions while preserving most of the good properties of Louvain. We build off an advance in the efficient detection of separated communities, the multilevel Louvain method, and draw inspiration from the Wang-Landau efficiency improvement to Markov chain Monte Carlo sampling. Partitions are iteratively proposed by Louvain, with the internal edges of the best parts downweighted after each step. This suppresses the dominant parts in subsequent partitions, allowing alternative parts to appear. The result is an ensemble of parts describing the overlapping structure of the network.
|
3 |
IDLE: A Novel Approach to Improving Overlapping Community Detection in Complex NetworksSenthil, Rathna 18 April 2016 (has links)
Complex systems in areas such as biology, physics, social science, and technology are extensively modeled as networks due to the rich set of tools available for their study and analysis. In such networks, groups of nodes that correspond to functional units or those that share some common attributes result in densely connected structures called communities. Community formation is an inherent process, and it is not easy to detect these structures because of the complex ways in which components of these systems interact.
Detecting communities in complex networks is important because it helps us to understand their internal dynamics better, thereby leading to significant insights into the underlying systems. Overlapping communities are formed when nodes in the network simultaneously belong to more than one community, and it has been shown that most real networks naturally contain such an overlapping community structure. In this thesis, I introduce a new approach to overlapping community detection called IDLE that incorporates ideas from another interesting problem: the identification of influential spreaders. Influential spreaders are nodes that play an important role in the propagation of information or diseases in networks. Research suggests that the main core identified by k-core decomposition techniques are the most influential spreaders. In my approach, I use these k-cores as candidate seeds for local community detection. Following a well-defined seed selection process, IDLE builds and prunes their corresponding local communities. It then augments the resulting local communities and puts them together to obtain the global overlapping community structure of the network.
My approach improves on the current local community detection techniques, because they use either random nodes or maximal k-cliques as seeds, and they do not focus explicitly on detecting overlapping nodes in the network. Hence their results can be significantly improved in building ground-truth overlapping communities. The results of my experiments on real and synthetic networks indicate that IDLE results in enhanced overlapping community detection and thereby a better identification of overlapping nodes that could be important or influential components in the underlying system. / Master of Science
|
4 |
Modern aspects of unsupervised learningLiang, Yingyu 27 August 2014 (has links)
Unsupervised learning has become more and more important due to the recent explosion of data. Clustering, a key topic in unsupervised learning, is a well-studied task arising in many applications ranging from computer vision to computational biology to the social sciences. This thesis is a collection of work exploring two modern aspects of clustering: stability and scalability.
In the first part, we study clustering under a stability property called perturbation resilience. As an alternative approach to worst case analysis, this novel theoretical framework aims at understanding the complexity of clustering instances that satisfy natural stability assumptions. In particular, we show how to correctly cluster instances whose optimal solutions are resilient to small multiplicative perturbations on the distances between data points, significantly improving existing guarantees. We further propose a generalized property that allows small changes in the optimal solutions after perturbations, and provide the first known positive results in this more challenging setting.
In the second part, we study the problem of clustering large scale data distributed across nodes which communicate over the edges of a connected graph. We provide algorithms with small communication cost and provable guarantees on the clustering quality. We also propose algorithms for distributed principal component analysis, which can be used to reduce the communication cost of clustering high dimensional data while merely comprising the clustering quality.
In the third part, we study community detection, the modern extension of clustering to network data. We propose a theoretical model of communities that are stable in the presence of noisy nodes in the network, and design an algorithm that provably detects all such communities. We also provide a local algorithm for large scale networks, whose running time depends on the sizes of the output communities but not that of the entire network.
|
5 |
Enhancing Recommendations for Conference Participants with Community and Topic ModelingPasham, Bharath January 2013 (has links)
§ For a researcher it is always important to increase his/her social capital and excel attheir research area. For this, conferences act as perfect medium where researchers meetand present their work. However, due to the structure of the conferences finding similarauthors or interesting talks is not obvious for the researchers. One of most importantobservation made from the conferences is, researchers tend to form communities withcertain research topics as the series of conferences progresses. These communitiesand their research topics could be used in helping researchers find their potentialcollaborators and in attending interesting talks. In this research we present the design and implementation of a recommender systemwhich is built to provide recommendation of authors and talks at the conferences.Various concepts like Social Network Analysis (SNA), context awareness, communityanalysis, and topic modeling are used to build the system. This system can beconsidered as an extension to the previous system CAMRS (Context Aware MobileRecommender System). CAMRS is a mobile application which serves the same purposeas the current system. However, CAMRS uses only SNA and context to providerecommendations. Current system, CAMRS-2, is also an Android application builtusing REST based architecture. The system is successfully is deployed, and as partof thesis the system is evaluated. The evaluation results proved CAMRS-2 providesbetter recommendations over its predecessor.
|
6 |
Partitionnement de grands graphes : mesures, algorithmes et visualisation / Graph Partitioning : measures, algorithms and visualizationQueyroi, François 10 October 2013 (has links)
L'analyse de réseaux (représentés par des graphes) est une composante importante dans la compréhension de systèmes complexes issus de nombreuses disciplines telles que la biologie, la géographie ou la sociologie. Nous nous intéressons dans cette thèse aux décompositions de ces réseaux. Ces décompositions sont utiles pour la compression des données, la détection de communautés ou la visualisation de graphes. Une décomposition possible est un partitionnement hiérarchique des sommets du graphe. Nous traitons de l'évaluation de la qualité de telles structures (leur capacité à bien capturer la topologie du graphe) par le biais de mesures de qualité. Nous discutons ensuite l'utilisation de ces mesures en tant que fonctions objectives à maximiser dans le cadre d'algorithmes de partitionnement. Enfin, nous nous intéressons à la définition de métaphores visuelles efficaces permettant de représenter différentes décompositions de graphes. / Network analysis is an important step in the understanding of complex systems studied in various areas such as biology, geography or sociology. This thesis focuses on the problems related to the decomposition of those networks when they are modeled by graphs. Graph decomposition methods are useful for data compression, community detection or network visualisation. One possible decomposition is a hierarchical partition of the set of vertices. We propose a method to evaluate the quality of such structures using quality measures and algorithms to maximise those measures. We also discuss the design of effective visual metaphors to represent various graph decompositions.
|
7 |
Partitioning temporal networks : A study of finding the optimal partition of temporal networks using community detectionAxel, Lindegren January 2018 (has links)
Many of the algorithms used for community detection in temporal networks have been adapted from static network theory. A common approach in dealing with the temporal dimension is to create multiple static networks from one temporal, based on a time condition. In this thesis, focus lies on identifying the optimal partitioning of a few temporal networks. This is done by utilizing the popular community detection algorithm called Generalized Louvain. Output of the Generalized Louvain comes in two parts. First, the created community structure, i.e. how the network is connected. Secondly, a measure called modularity, which is a scalar value representing the quality of the identified community structure. The methodology used is aimed at creating a comparable result by normalizing modularity. The normalization process can be explained in two major steps: 1) study the effects on modularity when partitioning a temporal network in an increasing number of slices. 2) study the effects on modularity when varying the number of connections (edges) in each time slice. The results show that the created methodology yields comparable results on two out of the four here tested temporal networks, implying that it might be more suited for some networks than others. This can serve as an indication that there does not exist a general model for community detection in temporal networks. Instead, the type of network is key to choosing the method.
|
8 |
Finding Community Structures In Social Activity DataPeng, Chengbin 19 May 2015 (has links)
Social activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects.
The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes.
The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The
time complexity is linear in the number of edges of the input network.
The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community
structures in bipartite networks, which is useful in recommender systems.
Our algorithms are compared with the results in recent papers and their quality
and e ciency are verified by experiments.
|
9 |
Revisiting Network SamplingWang, Yu 11 July 2019 (has links)
No description available.
|
10 |
Efficient Skyline Community Discovery in Large NetworksAkber, Mohammad Ali 30 August 2022 (has links)
Every entity in the real world can be described uniquely by it’s attributes. It is possible to rank similar entities based on these attributes, i.e. a professor can be ranked by his/her number of publications, citations etc. A community is formed by a group of connected entities. Individual ranking of an entity plays an important role in the quality of a community. Skyline community in a network represents the highest ranked communities in the network. But how do we define this ranking? Ranking system in some model considers only a single attribute [16], whereas the other [15] [23] considers multiple attributes. Intuitively multiple attributes represent a community better and produce good results. We propose a novel community discovery model, which considers multiple attribute when ranking the community and is efficient in terms of computation time and result size. We use a progressive (can produce re- sults gradually without depending on the future processing) algorithm to calculate the community in an order such that a community is guaranteed not to be dominated by those generated after it. And to verify the dominance relationship between two communities, we came up with a range based comparison where the dominance rela- tionship is decided by the set of nodes each group dominates. If domination list of a group is a subset of another group, we say the second group dominates the first. Because a groups domination list contains it’s member along with the nodes they dominate. So in the example, the second group dominates every node of the first group. / Graduate
|
Page generated in 0.0966 seconds