Global ETD Search

1	Unsupervised random walk node embeddings for network block structure representation Lin, Christy 25 September 2021 (has links) There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime. Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases. / 2023-09-24T00:00:00Z Computer science Community detection Node embeddings Random walk Stochastic block model
2	Methods, algorithms and impossibility results for machine learning on graphs Sotiropoulos, Konstantinos 03 February 2025 (has links) 2023 / In recent years, there has been a remarkable increase in the use of machine learning techniques for analyzing graphs and their associated applications such as node classification, link prediction, community detection, and generating new graph instances with desired characteristics. This motivates the desire to create innovative and effective algorithms, as well as explore the potential and constraints of modern deep learning techniques, which have garnered considerable attention. This dissertation makes contributions in both of these areas. First, we propose innovative and scalable methods that rely solely on local node information for both unsupervised and supervised graph learning tasks. Specifically, we emphasize the significance of local triangle counts in community detection and introduce a novel triangle-aware spectral sparsification algorithm that enhances the efficiency of this task. Secondly, we analyze a Twitter dataset and create a supervised learning framework that leverages the multiple layers of interaction among Twitter users, resulting in a more precise prediction of new links among them. The emergence of deep learning has sparked interest in the use of unsupervised node embeddings, which are low-dimensional vector representations of nodes, and have become the primary tool in many graph-based machine learning tasks. A fundamental question arises: Can real-world networks be accurately represented in a low-dimensional space? We contribute to the understanding of node embeddings in two significant ways. Firstly, we prove that any graph with bounded maximum degree can be embedded in low dimensions, and we offer an algorithm that accurately embeds real-world networks in a few dimensions, typically in the order of tenths. Secondly, we explore contemporary embedding techniques and find that their embeddings are not always precise, as different graphs can have similar low-dimensional representations. However, despite the lack of exactness, these methods successfully encode sufficient information for high performance on node classification tasks. Finally, we study graph generative models under a unique novel criterion: their ability to generate graphs that are simultaneously edge-diverse and rich in small-sized dense subgraphs. We show the limitations of edge independent graph generative models and develop a hierarchy of models that are progressively more powerful in terms of mimicking better real-world networks. We complement our analysis with simple baseline methods relying on dense subgraph detection that perform competitively against more complex methods. Computer science Generative Models Graphs Link Prediction Machine Learning Node Embeddings Sparsification
3	A Bridge between Graph Neural Networks and Transformers: Positional Encodings as Node Embeddings Manu, Bright Kwaku 01 December 2023 (has links) (PDF) Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning tasks. While they were evolved separately in diverse fields, current research has revealed some similarities and links between them. This work focuses on bridging the gap between GNNs and Transformers by offering a uniform framework that highlights their similarities and distinctions. We perform positional encodings and identify key properties that make the positional encodings node embeddings. We found that the properties of expressiveness, efficiency and interpretability were achieved in the process. We saw that it is possible to use positional encodings as node embeddings, which can be used for machine learning tasks such as node classification, graph classification, and link prediction. We discuss some challenges and provide future directions. message passing graph convolution transformer node embeddings positional encodings Artificial Intelligence and Robotics Data Science Discrete Mathematics and Combinatorics Theory and Algorithms
4	Aplikace metody učení bez učitele na hledání podobných grafů / Application of Unsupervised Learning Methods in Graph Similarity Search Sabo, Jozef January 2021 (has links) Goal of this master's thesis was in cooperation with the company Avast to design a system, which can extract knowledge from a database of graphs. Graphs, used for data mining, describe behaviour of computer systems and they are anonymously inserted into the company's database from systems of the company's products users. Each graph in the database can be assigned with one of two labels: clean or malware (malicious) graph. The task of the proposed self-learning system is to find clusters of graphs in the graph database, in which the classes of graphs do not mix. Graph clusters with only one class of graphs can be interpreted as different types of clean or malware graphs and they are a useful source of further analysis on the graphs. To evaluate the quality of the clusters, a custom metric, named as monochromaticity, was designed. The metric evaluates the quality of the clusters based on how much clean and malware graphs are mixed in the clusters. The best results of the metric were obtained when vector representations of graphs were created by a deep learning model (variational graph autoencoder with two relation graph convolution operators) and the parameterless method MeanShift was used for clustering over vectors.
5	On Higher Order Graph Representation Learning Balasubramaniam Srinivasan (12463038) 26 April 2022 (has links) <p>Research on graph representation learning (GRL) has made major strides over the past decade, with widespread applications in domains such as e-commerce, personalization, fraud & abuse, life sciences, and social network analysis. Despite its widespread success, fundamental questions on practices employed in modern day GRL have remained unanswered. Unraveling and advancing two such fundamental questions on the practices in modern day GRL forms the overarching theme of my thesis.</p> <p>The first part of my thesis deals with the mathematical foundations of GRL. GRL is used to solve tasks such as node classification, link prediction, clustering, graph classification, and so on, albeit with seemingly different frameworks (e.g. Graph neural networks for node/graph classification, (implicit) matrix factorization for link prediction/ clustering, etc.). The existence of very distinct frameworks for different graph tasks has puzzled researchers and practitioners alike. In my thesis, using group theory, I provide a theoretical blueprint that connects these seemingly different frameworks, bridging methods like matrix factorization and graph neural networks. With this renewed understanding, I then provide guidelines to better realize the full capabilities of these methods in a multitude of tasks.</p> <p>The second part of my thesis deals with cases where modeling real-world objects as a graph is an oversimplified description of the underlying data. Specifically, I look at two such objects (i) modeling hypergraphs (where edges encompass two or more vertices) and (ii) using GRL for predicting protein properties. Towards (i) hypergraphs, I develop a hypergraph neural network which takes advantage of the inherent sparsity of real world hypergraphs, without unduly sacrificing on its ability to distinguish non isomorphic hypergraphs. The designed hypergraph neural network is then leveraged to learn expressive representations of hyperedges for two tasks, namely hyperedge classification and hyperedge expansion. Experiments show that using our network results in improved performance over the current approach of converting the hypergraph into a dyadic graph and using (dyadic) GRL frameworks. Towards (ii) proteins, I introduce the concept of conditional invariances and leverage it to model the inherent flexibility present in proteins. Using conditional invariances, I provide a new framework for GRL which can capture protein-dependent conformations and ensures that all viable conformers of a protein obtain the same representation. Experiments show that endowing existing GRL models with my framework shows noticeable improvements on multiple different protein datasets and tasks.</p> Applied Computer Science Machine Learning Neural Networks Deep Learning Artificial Intelligence Computer Science Graphs Invariances Hypergraphs Proteins Conditional Invariances Graph Representation Learning Protein Representation Learning Hypergraph Representation Learning Structural Graph Representations Positional Node Embeddings Protein Conformers
6	Dynamic Network Modeling from Temporal Motifs and Attributed Node Activity Giselle Zeno (16675878) 26 July 2023 (has links) <p>The most important networks from different domains—such as Computing, Organization, Economic, Social, Academic, and Biology—are networks that change over time. For example, in an organization there are email and collaboration networks (e.g., different people or teams working on a document). Apart from the connectivity of the networks changing over time, they can contain attributes such as the topic of an email or message, contents of a document, or the interests of a person in an academic citation or a social network. Analyzing these dynamic networks can be critical in decision-making processes. For instance, in an organization, getting insight into how people from different teams collaborate, provides important information that can be used to optimize workflows.</p> <p><br></p> <p>Network generative models provide a way to study and analyze networks. For example, benchmarking model performance and generalization in tasks like node classification, can be done by evaluating models on synthetic networks generated with varying structure and attribute correlation. In this work, we begin by presenting our systemic study of the impact that graph structure and attribute auto-correlation on the task of node classification using collective inference. This is the first time such an extensive study has been done. We take advantage of a recently developed method that samples attributed networks—although static—with varying network structure jointly with correlated attributes. We find that the graph connectivity that contributes to the network auto-correlation (i.e., the local relationships of nodes) and density have the highest impact on the performance of collective inference methods.</p> <p><br></p> <p>Most of the literature to date has focused on static representations of networks, partially due to the difficulty of finding readily-available datasets of dynamic networks. Dynamic network generative models can bridge this gap by generating synthetic graphs similar to observed real-world networks. Given that motifs have been established as building blocks for the structure of real-world networks, modeling them can help to generate the graph structure seen and capture correlations in node connections and activity. Therefore, we continue with a study of motif evolution in <em>dynamic</em> temporal graphs. Our key insight is that motifs rarely change configurations in fast-changing dynamic networks (e.g. wedges intotriangles, and vice-versa), but rather keep reappearing at different times while keeping the same configuration. This finding motivates the generative process of our proposed models, using temporal motifs as building blocks, that generates dynamic graphs with links that appear and disappear over time.</p> <p><br></p> <p>Our first proposed model generates dynamic networks based on motif-activity and the roles that nodes play in a motif. For example, a wedge is sampled based on the likelihood of one node having the role of hub with the two other nodes being the spokes. Our model learns all parameters from observed data, with the goal of producing synthetic graphs with similar graph structure and node behavior. We find that using motifs and node roles helps our model generate the more complex structures and the temporal node behavior seen in real-world dynamic networks.</p> <p><br></p> <p>After observing that using motif node-roles helps to capture the changing local structure and behavior of nodes, we extend our work to also consider the attributes generated by nodes’ activities. We propose a second generative model for attributed dynamic networks that (i) captures network structure dynamics through temporal motifs, and (ii) extends the structural roles of nodes in motifs to roles that generate content embeddings. Our new proposed model is the first to generate synthetic dynamic networks and sample content embeddings based on motif node roles. To the best of our knowledge, it is the only attributed dynamic network model that can generate <em>new</em> content embeddings—not observed in the input graph, but still similar to that of the input graph. Our results show that modeling the network attributes with higher-order structures (e.g., motifs) improves the quality of the networks generated.</p> <p><br></p> <p>The generative models proposed address the difficulty of finding readily-available datasets of dynamic networks—attributed or not. This work will also allow others to: (i) generate networks that they can share without divulging individual’s private data, (ii) benchmark model performance, and (iii) explore model generalization on a broader range of conditions, among other uses. Finally, the evaluation measures proposed will elucidate models, allowing fellow researchers to push forward in these domains.</p> Modelling and simulation Data mining and knowledge discovery Graph, social and multimedia data Neural networks Graph Machine Learning network evolution model temporal graph model Dynamic Networks, Attributed Graphs Social network analysis tools convolutional neural network (CNN) graph convolutional network (GCN) node embeddings language model bert Collective classification Collective inference Node classification model evaluation techniques synthetic networks BERT models pre-trained language models

1

Page generated in 0.1108 seconds