Global ETD Search

41	Unsupervised random walk node embeddings for network block structure representation Lin, Christy 25 September 2021 (has links) There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime. Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases. / 2023-09-24T00:00:00Z Computer science Community detection Node embeddings Random walk Stochastic block model
42	Sparse Similarity and Network Navigability for Markov Clustering Enhancement Durán Cancino, Claudio Patricio 29 September 2021 (has links) Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability. info:eu-repo/classification/ddc/004 ddc:004
43	Hidden Fear: Evaluating the Effectiveness of Messages on Social Media January 2020 (has links) abstract: The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic discussions on political and social matters. Among social media, Twitter is widely used by politicians, government officials, communities, and parties to make announcements and reach their voice to their followers. This greatly increases the acceptance domain of the medium. The usage of social media during social and political campaigns has been the subject of a lot of social science studies including the Occupy Wall Street movement, The Arab Spring, the United States (US) election, more recently The Brexit campaign. The wide spread usage of social media in this space and the active participation of people in the discussions on social media made this communication channel a suitable place for spreading propaganda to alter public opinion. An interesting feature of twitter is the feasibility of which bots can be programmed to operate on this platform. Social media bots are automated agents engineered to emulate the activity of a human being by tweeting some specific content, replying to users, magnifying certain topics by retweeting them. Network on these bots is called botnets and describing the collaboration of connected computers with programs that communicates across multiple devices to perform some task. In this thesis, I will study how bots can influence the opinion, finding which parameters are playing a role in shrinking or coalescing the communities, and finally logically proving the effectiveness of each of the hypotheses. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Computer science Political science Social psychology Brexit Community Detection Fake News Polarization Social Media Twitter
44	Caractériser et détecter les communautés dans les réseaux sociaux / Characterising and detecting communities in social networks Creusefond, Jean 21 February 2017 (has links) Dans cette thèse, je commence par présenter une nouvelle caractérisation des communautés à partir d'un réseau de messages inscrits dans le temps. Je montre que la structure de ce réseau a un lien avec les communautés : on trouve majoritairement des échanges d'information à l'intérieur des communautés tandis que les frontières servent à la diffusion.Je propose ensuite d'évaluer les communautés par la vitesse de propagation des communications qui s'y déroulent avec une nouvelle fonction de qualité : la compacité. J'y présente aussi un algorithme de détection de communautés, le Lex-Clustering, basé sur un algorithme de parcours de graphe qui reproduit des caractéristiques des modèles de diffusion d'information. Enfin, je présente une méthodologie permettant de faire le lien entre les fonctions de qualité et les vérités de terrain. J'introduis le concept de contexte, des ensembles de vérités de terrain qui présentente des ressemblances. Je mets à disposition un logiciel nommé CoDACom (Community Detection Algorithm Comparator, codacom.greyc.fr) permettant d'appliquer cette méthodologie ainsi que d'utiliser un grand nombre d'outils de détection de communautés. / N this thesis, I first present a new way of characterising communities from a network of timestamped messages. I show that its structure is linked with communities : communication structures are over-represented inside communities while diffusion structures appear mainly on the boundaries.Then, I propose to evaluate communities with a new quality function, compacity, that measures the propagation speed of communications in communities. I also present the Lex-Clustering, a new community detection algorithm based on the LexDFS graph traversal that features some characteristics of information diffusion.Finally, I present a methodology that I used to link quality functions and ground-truths. I introduce the concept of contexts, sets of ground-truths that are similar in some way. I implemented this methodology in a software called CoDACom (Community Detection Algorithm Comparator, codacom.greyc.fr) that also provides many community detection tools. Détection de communautés Vérité de terrain Graphe Algorithmes Community detection Ground-truth Graphs Algorithms
45	Modelling Hierarchical Structures in Networks Using Graph Theory : With Application to Knowledge Networks in Graph Curricula Wengle, Emil January 2020 (has links) Community detection is a topic in network theory that involves assigning labels to nodes based on some distance measure or centrality index. Detecting communities within a network can be useful to perform information condensation. In this thesis we explore how to use the approach for pedagogical purposes, and more precisely to condense and visualise the networks of facts, concepts and procedures (also called Knowledge Components (KCs)) that are offered in higher education programmes. In details, we consider one of the most common quantities used to evaluate the goodness of a community classification, which is the concept of modularity. Detecting communities by computing the maximum possible modularity indexes is indeed usually desired, but this approach is generally unavailable because the associated optimisation problem is NP-complete. This is why practitioners use other algorithms, that instead of computing the optimum they rely on various heuristics to find communities: some use modularity directly, some start from the entire graph and divide it repeatedly, and some contain random elements. This thesis investigates the trade-offs of using different community detection algorithms and variations of the concept of modularity first in general terms, and then for the purpose of identifying communities in knowledge graphs associated to higher education programmes, which can be modelled as directed graphs of KCs. We discover, tweaking and applying these algorithms both on synthetic but also field data that the Louvain algorithm is among the better algorithms of those that we considered, which is mostly thanks to its efficiency. It does not produce a full hierarchy, however, so we recommend Fast Newman if hierarchy is important. community detection graph theory modularity higher education Computer and Information Sciences Data- och informationsvetenskap
46	Finding the KTH collaboration network : A comparative analysis of Girvan-Newman andRosvall-Bergstrom's community detectionalgorithms on KTH's scientist collaborationnetwork / Hitta KTHs sammarbetsnätverk : En analytisk jämförelse av Girvan-Newman och Rosvall-Bergstrom grupperingsdetektionsalgoritmer utförda på KTHs forskningssamarbetsnätverk Eklind, Henry, Gileborg, Robin January 2016 (has links) Using DiVAs data on published works and their authors, a database was constructed on which Girvan-Newmans (2002) and Rosvall-Bergstroms (2007) community finding algorithms were applied. The results were compared in an effort to evaluate the current allocation of researchers at KTH's divisions and the performance of said algorithms on the collaboration network. Rosvall-Bergstrom performed better than Girvan-Newman both in result and performance. The results of both algorithms are similar, and illustrate that the current allocation of researchers forces work across school borders. / Med DiVAs data om publicerade verk och deras författare, skapades en databas som Girvan-Newmans (2002) and Rosvall-Berstroms (2007) grupperingsdetektionsalgoritmer applicerades på. Resultaten jämfördes med som mål att utvärdera den nuvarande placeringen av forskare vid KTHs interna skolor och algoritmernas prestanda på samarbetsnätverket. Rosvall-Bergstroms presterade bättre än Girvan-Newmans både i resultat och i prestanda. Resultatet av båda algoritmerna är lika, och visar att den nuvarande placeringen av forskare tvingar till samarbete över skolornas gränser. community detection girvan-newman rosvall-bergstrom homogeneity heterogeneity Computer Sciences Datavetenskap (datalogi)
47	Towards multivariant pathogenicity predictions: Using machine-learning to directly predict and explore disease-causing oligogenic variant combinations Papadimitriou, Sofia 15 September 2020 (has links) (PDF) The emergence of statistical and predictive methods able to analyse genomic data has revolutionised the field of medical genetics, allowing the identification of disease-causing gene variants (i.e. mutations) for several human genetic diseases. Although these approaches have greatly improved our understanding of Mendelian «one gene – one phenotype» genetic models, studying diseases related to more intricate models that involve causative variants in several genes (i.e. oligogenic diseases) still remains a challenge, either due to the lack of sufficient methodologies and disease-specific cohorts to study or the phenotypic complexity associated with such diseases. This situation makes it difficult to not only understand the genetic mechanisms of the disease, but to also offer proper counseling and support to the patient. Until recently, no specialized predictive methods existed to directly predict causative variant combinations for oligogenic diseases. However, with the advent of data on variant combinations in gene pairs (i.e. bilocus variant combinations) leading to disease, collected at the Digenic Diseases Database (DIDA), we hypothesized that the transition from single to variant combination pathogenicity predictors is now possible.To investigate this hypothesis, we organised our research on two main routes. At first, we developed an interpretable variant combination pathogenicity predictor, called VarCoPP, for gene pairs. For this goal, we trained multiple Random Forest algorithms on pathogenic bilocus variant combinations from DIDA against neutral data from the 1000 Genomes Project and investigated the contribution of the incorporated variant, gene and gene pair features to the prediction outcome. In the second part, we explored the usefulness of different gene pair burden scores based on this novel predictive method, in discovering oligogenic signatures in neurodevelopmental diseases, which involve a spectrum of monogenic to polygenic cases. We performed a preliminary analysis on the Deciphering Developmental Diseases (DDD) project containing exome data of 4195 families and assessed the capability of our scores in supporting already diagnosed monogenic cases, discovering significant pairs compared to control cases and linking patients in communities based on the genetic burden they share, using the Leiden community detection algorithm.The performance of VarCoPP shows that it is possible to predict disease-causing bilocus variant combinations with good accuracy both during cross-validation and when testing on new cases. We also show its relevance for disease-related gene panels, and enhanced its clinical applicability by defining confidence zones that guarantee with 95\% or 99\% probability that a prediction is indeed a true positive, guiding clinical researchers towards the most relevant results. This method and additional biological annotations are incorporated in an online platform called ORVAL that allows the prediction and exploration of candidate disease-causing oligogenic variant combinations with predicted gene networks, based on patient variant data. Our preliminary analysis on the DDD cohort shows that - although all bi-locus burden scores show advantages, disadvantages and certain types of biases - taking the maximum pathogenicity score present inside a gene pair seems to provide, at the moment, the most unbiased results. We also show that our predictive methods enable us to detect patient communities inside DDD, based exclusively on the shared pathogenic bi-locus burden between patients, with more than half of these communities containing enriched phenotypic and molecular pathway information. Our predictive method is also able to bring to the surface genes not officially known to be involved in disease, but nevertheless, with a biological relevance, as well as a few examples of potential oligogenicity inside the network, paving the way for further exploration of oligogenic signatures for neurodevelopmental diseases. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Génétique clinique Informatique médicale bioinformatics machine-learning oligogenic diseases neurodevelopmental diseases community detection
48	Where you sit matters: diplomatic networks and international conflict Choi, Seulah 10 January 2022 (has links) "Where You Sit Matters: Diplomatic Networks and International Conflict" examines how a state's structural position within diplomatic networks influences its foreign policy behaviors, particularly in the domain of international security. Despite the established understanding in International Relations (IR) that relationships among countries matter, there is little empirical knowledge on what exactly the complicated web of those relationships looks like and how it impacts state behavior. Much IR literature tends to focus only on dyadic or multilateral relationships and treat networks as background, which has left a gap in our understanding of how the structures of international networks affect international outcomes. To address this gap, my dissertation uses network analysis and a variety of statistical methods to reveal key structures of diplomatic networks and examine their impacts on a state's foreign policy behavior. My argument extends in three directions. The first part uses a large-n, cross-sectional analysis to examine the impacts of a state's broker position within diplomatic networks on its decision to initiate and escalate militarized interstate disputes (MIDs). By using the rare events logit and Heckman selection models, I find that occupying a broker position in diplomatic networks increases a state's decision to initiate MIDs over the nearly 200-year period from 1817 to 2001; its marginal impact is nearly twice that of military capability. The second part employs a separable temporal exponential random graph model (STERGM) to examine how key structures of diplomatic networks influence a state's decision to terminate diplomatic ties. My findings show that the breakdown of diplomatic ties is not a rare event and network dynamics play a role in terminating ties: states take cues from other countries in the network to decide whether or not to terminate diplomatic ties. The last part uses a community detection method, specifically a link communities method, to reveal latent communities of the diplomatic network and identify key countries that belong to multiple communities. I find that the diplomatic network resembles a hierarchical structure in that diplomatic communities tend to overlap; only a small number of major powers simultaneously belong to multiple communities and few communities are independent from those major powers. Political science Community detection Diplomatic networks International conflict Militarized interstate disputes Network analysis STERGM
49	Is the cultural field hypothesis true for Finland? Tapio Bustos, Emanuel January 2023 (has links) This study presents an in-depth analysis of spatial voting behavior in Finland at the municipality level, using electoral data from 1983 to 2019. The primary objective is to investigate if the cultural field hypothesis holds true for Finland. If this hypothesis holds, distinct cultural domains should emerge within Finland. Furthermore, we hypothesized that if the cultural field hypothesis holds true, a distinct community would appear along the Russian border, leading to an east-west partition of Finland. To test the cultural field hypothesis, we do a spatial correlation analysis, and we use community detection to find distinct communities within Finland. The spatial correlation analysis suggested the existence of distinct communities in Finland that span approximately 400 km in length. The community detection then confirms this, revealing two main communities: The northern and the southern communities, and in 6 out of 10 elections, three communities emerged. Hence, the cultural field hypothesis holds true for Finland. However, the distribution of these communities did not support the hypothesis that a distinct community would emerge along the Russian border creating an east-west partition of Finland. Instead, we observed a north-south and a “west/south coastal” partition. Finland cultural field hypothesis spatial correlations community detection Other Mathematics Annan matematik
50	Modelling of Social Networks, Analysis of CommunityStructures and Disease Simulation on Dairy Cattle : The aim of this study is to model a social network of dairy cows, detectcommunities and analyse the influence of these substructures on the spread of acontagious disease. By using this network, the transmission of a contagiousdisease will be simulated through a theoretical simulation function. Macedo, Juliana, Czarnetzki, Mira January 2022 (has links) Past research has demonstrated that social networks and community structures have a strong effecton the dynamics of how an infectious disease spreads, although many have focused moreintensively on the impact of the number of connections an individual has. This thesis investigatedhow disease transmission is affected by the social networks’ structures such as communities,centrality measures, cliques, and diameter. For this purpose, a farm located in Sweden with around200 cows will be used and two social networks are were built based on spatial data from a periodof 24h. A hypothetical model was built to simulate disease transmission with two scenarios: onewhen there was immunity and another without immunity. The main hypothesis presented here isthat if the focus of infections starts on an individual located in the largest community, it would resultin faster and larger spread of the disease, the betweenness centrality was used in thesecommunities to choose the individual to start the focus of infection. It is also investigated whetherthere was any preferential aggregation due to parity, which is related to the number of calves a cowhas had. It was found that the size of the community did not have a high influence in the rate ofinfections, meanwhile the overall centrality in the whole networks and the presence of certainindividuals in the cliques seemed to play a bigger role. The results indicated some preferentialrelationship due to parity, although it is unclear if there is a specific parity that tends to aggregatemore than others.The two networks had different structures, making it difficult to generalize results and make arecommendation to farmers Social Sciences Interdisciplinary

Search results