Global ETD Search

31	Detection and Analysis of Online Extremist Communities Benigni, Matthew Curran 01 May 2017 (has links) Online social networks have become a powerful venue for political activism. In many cases large, insular online communities form that have been shown to be powerful diffusion mechanisms of both misinformation and propaganda. In some cases these groups users advocate actions or policies that could be construed as extreme along nearly any distribution of opinion, and are thus called Online Extremist Communities (OECs). Although these communities appear increasingly common, little is known about how these groups form or the methods used to influence them. The work in this thesis provides researchers a methodological framework to study these groups by answering three critical research questions: How can we detect large dynamic online activist or extremist communities? What automated tools are used to build, isolate, and influence these communities? What methods can be used to gain novel insight into large online activist or extremist communities? These group members social ties can be inferred based on the various affordances offered by OSNs for group curation. By developing heterogeneous, annotated graph representations of user behavior I can efficiently extract online activist discussion cores using an ensemble of unsupervised machine learning methods. I call this technique Ensemble Agreement Clustering. Through manual inspection, these discussion cores can then often be used as training data to detect the larger community. I present a novel supervised learning algorithm called Multiplex Vertex Classification for network bipartition on heterogeneous, annotated graphs. This methodological pipeline has also proven useful for social botnet detection, and a study of large, complex social botnets used for propaganda dissemination is provided as well. Throughout this thesis I provide Twitter case studies including communities focused on the Islamic State of Iraq and al-Sham (ISIS), the ongoing Syrian Revolution, the Euromaidan Movement in Ukraine, as well as the alt-Right. Covert Network Detection Community Detection Annotated Networks Multilayer Networks Heterogeneous Networks Spectral Clustering
32	Predição de links em redes complexas utilizando informações de estruturas de comunidades / Link prediction in complex networks using community structure information Jorge Carlos Valverde Rebaza 27 March 2013 (has links) Diferentes sistemas do mundo real podem ser representados por redes. As redes são estruturas nas quais seus vértices (nós) representam entidades e links representam relações entre essas entidades. Além disso, as redes caracterizam-se por ser estruturas dinâmicas, o que implica na rápida aparição e desaparição de entidades e seus relacionamentos. Nesse cenário, um dos problemas importantes a serem enfrentados no contexto das redes, é da predição de links, isto é, prever a ocorrência futura de um link ainda não existente entre dois vértices com base nas informações já existentes. A importância da predição de links deve-se ao fato de ter aplicações na recuperação de informação, identificação de interações espúrias e, ainda, na avaliação de mecanismos de evolução das redes. Para enfrentar o problema da predição de links, a maioria dos métodos utiliza informações da vizinhança topológica das redes para atribuir um valor que represente a probabilidade de conexão futura entre um par de vértices analisados. No entanto, recentemente têm aparecido métodos híbridos, caracterizados por usar outras informações além da vizinhança topológica, sendo as informações das comunidades as normalmente usadas, isso, devido ao fato que, ao serem grupos de vértices densamente ligados entre si e esparsamente ligados com vértices de outros grupos, fornecem informações que podem ser úteis para determinar o comportamento futuro das redes. Assim, neste trabalho são apresentadas duas propostas na linha dos métodos baseados nas informações das comunidades para predição de links. A primeira proposta consiste em um novo índice de similaridade que usa as informações dos vértices pertencentes a mesma comunidade na vizinhança de um par de vértices analisados, bem como as informações dos vértices pertencentes a diferentes comunidades nessa mesma vizinhança. A segunda proposta consiste de um conjunto de índices obtidos a partir da reformulação de algumas propostas já existentes, porém, inserindo neles informações dos vértices pertencentes unicamente à mesma comunidade na vizinhança topológica de um par de vértices analisados. Experimentos realizados em dez redes complexas de diferentes domínios demonstraram que, em geral, os índices propostos obtiveram desempenho superior às abordagens usuais / Different real-world systems can be represented as networks. Networks are structures in which vertices (nodes) represent entities and links represent relationships between these entities. Moreover, networks are dynamic structures, which implies rapid appearance and disappearance of entities and their relationships. In this scenario, the link prediction problem attempts to predict the future existence of a link between a pair of vertices considering existing information. The link prediction importance is due to the fact of having different applications in areas such as information retrieval, identification of spurious interactions, as well as for understanding mechanisms of network evolution. To address the link prediction problem, many proposals use topological information to assign a value that represents the likelihood of a future connection between a pair of vertices. However, hybrid methods have appeared recently. These methods use additional information such as community information. Communities are groups of vertices densely connected among them and sparsely connected to vertices from other groups, providing useful information to determinate the future behavior of networks. So, this research presents two proposals for link prediction based on communities information. The first proposal consists of a new similarity index that uses information about the communities that the vertices in the neighborhood of a analyzed pair of vertices belong. The second proposal is a set of indices obtained from the reformulation of various existing proposals, however, using only the information from vertices belonging to the same community in the neighborhood of a pair of vertices analyzed. Experiments conducted in ten complex networks of different fields show the proposals outperform traditional approaches Detecção de comunidades Predição de links Redes complexas Community detection Complex netwoprks Link prediction
33	FAST COMMUNITY STRUCTURE ANALYSIS OF CALL GRAPHS FOR MALWARE DETECTION Pooja Patil (6636122) 15 May 2019 (has links) <div> <div> <div> <p>The use of graph-structured data in applications is increasing day by day. In order to infer useful information from such data, fast analytics and software tools are required. One of the graph analytics techniques used is community detection. Community detection is the technique of finding structural communities within a graph. Such communities are defined as groups which have highly connected nodes and have similarities with each other. </p> <p>This research proposes a parallel heuristic for faster community detection using the parallel version of the Louvain algorithm: Grappolo. The Louvain algorithm is a hierarchical algorithm that focuses on modularity optimization. It gained popularity because of its ability to detect high-quality communities faster than the other existing community detection algorithms. However, the Louvain algorithm is a sequential algorithm. To reduce the execution time of the Louvain algorithm, a parallel version named Grappolo exists in the literature. This algorithm proposes parallel heuristics that address the challenges that occur due to parallelizing the sequential Louvain algorithm. </p> <p>In this study, the researcher is investigating if Grappolo can be further parallelized to further reduce the execution time maintaining the quality of communities detected. To evaluate the proposed heuristic, it was tested on an OpenMP multithreaded environment. It was implemented on source codes of Android malware applications. However, as compared to Grapplolo, the proposed modified version resulted in higher execution times for the inputs tested. The modularity of the communities detected was similar to the Grappolo implementation. </p> </div> </div> </div> Computer Engineering community detection methods Call graphs Louvain algorithm Grappolo algorithm
34	De novo Population Discovery from Complex Biological Datasets Venkatasubramanian, Meenakshi 01 October 2019 (has links) No description available. Computer Science Clustering Alternative Splicing Bioinformatics Non-Negative Matrix Factorization Data Mining Community Detection
35	Unsupervised random walk node embeddings for network block structure representation Lin, Christy 25 September 2021 (has links) There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime. Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases. / 2023-09-24T00:00:00Z Computer science Community detection Node embeddings Random walk Stochastic block model
36	Sparse Similarity and Network Navigability for Markov Clustering Enhancement Durán Cancino, Claudio Patricio 29 September 2021 (has links) Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability. info:eu-repo/classification/ddc/004 ddc:004
37	Hidden Fear: Evaluating the Effectiveness of Messages on Social Media January 2020 (has links) abstract: The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic discussions on political and social matters. Among social media, Twitter is widely used by politicians, government officials, communities, and parties to make announcements and reach their voice to their followers. This greatly increases the acceptance domain of the medium. The usage of social media during social and political campaigns has been the subject of a lot of social science studies including the Occupy Wall Street movement, The Arab Spring, the United States (US) election, more recently The Brexit campaign. The wide spread usage of social media in this space and the active participation of people in the discussions on social media made this communication channel a suitable place for spreading propaganda to alter public opinion. An interesting feature of twitter is the feasibility of which bots can be programmed to operate on this platform. Social media bots are automated agents engineered to emulate the activity of a human being by tweeting some specific content, replying to users, magnifying certain topics by retweeting them. Network on these bots is called botnets and describing the collaboration of connected computers with programs that communicates across multiple devices to perform some task. In this thesis, I will study how bots can influence the opinion, finding which parameters are playing a role in shrinking or coalescing the communities, and finally logically proving the effectiveness of each of the hypotheses. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Computer science Political science Social psychology Brexit Community Detection Fake News Polarization Social Media Twitter
38	Caractériser et détecter les communautés dans les réseaux sociaux / Characterising and detecting communities in social networks Creusefond, Jean 21 February 2017 (has links) Dans cette thèse, je commence par présenter une nouvelle caractérisation des communautés à partir d'un réseau de messages inscrits dans le temps. Je montre que la structure de ce réseau a un lien avec les communautés : on trouve majoritairement des échanges d'information à l'intérieur des communautés tandis que les frontières servent à la diffusion.Je propose ensuite d'évaluer les communautés par la vitesse de propagation des communications qui s'y déroulent avec une nouvelle fonction de qualité : la compacité. J'y présente aussi un algorithme de détection de communautés, le Lex-Clustering, basé sur un algorithme de parcours de graphe qui reproduit des caractéristiques des modèles de diffusion d'information. Enfin, je présente une méthodologie permettant de faire le lien entre les fonctions de qualité et les vérités de terrain. J'introduis le concept de contexte, des ensembles de vérités de terrain qui présentente des ressemblances. Je mets à disposition un logiciel nommé CoDACom (Community Detection Algorithm Comparator, codacom.greyc.fr) permettant d'appliquer cette méthodologie ainsi que d'utiliser un grand nombre d'outils de détection de communautés. / N this thesis, I first present a new way of characterising communities from a network of timestamped messages. I show that its structure is linked with communities : communication structures are over-represented inside communities while diffusion structures appear mainly on the boundaries.Then, I propose to evaluate communities with a new quality function, compacity, that measures the propagation speed of communications in communities. I also present the Lex-Clustering, a new community detection algorithm based on the LexDFS graph traversal that features some characteristics of information diffusion.Finally, I present a methodology that I used to link quality functions and ground-truths. I introduce the concept of contexts, sets of ground-truths that are similar in some way. I implemented this methodology in a software called CoDACom (Community Detection Algorithm Comparator, codacom.greyc.fr) that also provides many community detection tools. Détection de communautés Vérité de terrain Graphe Algorithmes Community detection Ground-truth Graphs Algorithms
39	Towards multivariant pathogenicity predictions: Using machine-learning to directly predict and explore disease-causing oligogenic variant combinations Papadimitriou, Sofia 15 September 2020 (has links) (PDF) The emergence of statistical and predictive methods able to analyse genomic data has revolutionised the field of medical genetics, allowing the identification of disease-causing gene variants (i.e. mutations) for several human genetic diseases. Although these approaches have greatly improved our understanding of Mendelian «one gene – one phenotype» genetic models, studying diseases related to more intricate models that involve causative variants in several genes (i.e. oligogenic diseases) still remains a challenge, either due to the lack of sufficient methodologies and disease-specific cohorts to study or the phenotypic complexity associated with such diseases. This situation makes it difficult to not only understand the genetic mechanisms of the disease, but to also offer proper counseling and support to the patient. Until recently, no specialized predictive methods existed to directly predict causative variant combinations for oligogenic diseases. However, with the advent of data on variant combinations in gene pairs (i.e. bilocus variant combinations) leading to disease, collected at the Digenic Diseases Database (DIDA), we hypothesized that the transition from single to variant combination pathogenicity predictors is now possible.To investigate this hypothesis, we organised our research on two main routes. At first, we developed an interpretable variant combination pathogenicity predictor, called VarCoPP, for gene pairs. For this goal, we trained multiple Random Forest algorithms on pathogenic bilocus variant combinations from DIDA against neutral data from the 1000 Genomes Project and investigated the contribution of the incorporated variant, gene and gene pair features to the prediction outcome. In the second part, we explored the usefulness of different gene pair burden scores based on this novel predictive method, in discovering oligogenic signatures in neurodevelopmental diseases, which involve a spectrum of monogenic to polygenic cases. We performed a preliminary analysis on the Deciphering Developmental Diseases (DDD) project containing exome data of 4195 families and assessed the capability of our scores in supporting already diagnosed monogenic cases, discovering significant pairs compared to control cases and linking patients in communities based on the genetic burden they share, using the Leiden community detection algorithm.The performance of VarCoPP shows that it is possible to predict disease-causing bilocus variant combinations with good accuracy both during cross-validation and when testing on new cases. We also show its relevance for disease-related gene panels, and enhanced its clinical applicability by defining confidence zones that guarantee with 95\% or 99\% probability that a prediction is indeed a true positive, guiding clinical researchers towards the most relevant results. This method and additional biological annotations are incorporated in an online platform called ORVAL that allows the prediction and exploration of candidate disease-causing oligogenic variant combinations with predicted gene networks, based on patient variant data. Our preliminary analysis on the DDD cohort shows that - although all bi-locus burden scores show advantages, disadvantages and certain types of biases - taking the maximum pathogenicity score present inside a gene pair seems to provide, at the moment, the most unbiased results. We also show that our predictive methods enable us to detect patient communities inside DDD, based exclusively on the shared pathogenic bi-locus burden between patients, with more than half of these communities containing enriched phenotypic and molecular pathway information. Our predictive method is also able to bring to the surface genes not officially known to be involved in disease, but nevertheless, with a biological relevance, as well as a few examples of potential oligogenicity inside the network, paving the way for further exploration of oligogenic signatures for neurodevelopmental diseases. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Génétique clinique Informatique médicale bioinformatics machine-learning oligogenic diseases neurodevelopmental diseases community detection
40	Where you sit matters: diplomatic networks and international conflict Choi, Seulah 10 January 2022 (has links) "Where You Sit Matters: Diplomatic Networks and International Conflict" examines how a state's structural position within diplomatic networks influences its foreign policy behaviors, particularly in the domain of international security. Despite the established understanding in International Relations (IR) that relationships among countries matter, there is little empirical knowledge on what exactly the complicated web of those relationships looks like and how it impacts state behavior. Much IR literature tends to focus only on dyadic or multilateral relationships and treat networks as background, which has left a gap in our understanding of how the structures of international networks affect international outcomes. To address this gap, my dissertation uses network analysis and a variety of statistical methods to reveal key structures of diplomatic networks and examine their impacts on a state's foreign policy behavior. My argument extends in three directions. The first part uses a large-n, cross-sectional analysis to examine the impacts of a state's broker position within diplomatic networks on its decision to initiate and escalate militarized interstate disputes (MIDs). By using the rare events logit and Heckman selection models, I find that occupying a broker position in diplomatic networks increases a state's decision to initiate MIDs over the nearly 200-year period from 1817 to 2001; its marginal impact is nearly twice that of military capability. The second part employs a separable temporal exponential random graph model (STERGM) to examine how key structures of diplomatic networks influence a state's decision to terminate diplomatic ties. My findings show that the breakdown of diplomatic ties is not a rare event and network dynamics play a role in terminating ties: states take cues from other countries in the network to decide whether or not to terminate diplomatic ties. The last part uses a community detection method, specifically a link communities method, to reveal latent communities of the diplomatic network and identify key countries that belong to multiple communities. I find that the diplomatic network resembles a hierarchical structure in that diplomatic communities tend to overlap; only a small number of major powers simultaneously belong to multiple communities and few communities are independent from those major powers. Political science Community detection Diplomatic networks International conflict Militarized interstate disputes Network analysis STERGM

Search results