Spelling suggestions: "subject:"graph machine learning"" "subject:"raph machine learning""
1 |
Offloading the sampling stage of GNN training to smart storageKritharakis, Emmanouil 16 February 2024 (has links)
Graph Neural Networks (GNNs) have emerged as a robust model for machine learning, addressing complex graph-structured data, in contrast to traditional deep learning techniques primarily used for image and text data. However, the scalability of GNNs on large graphs with billions of nodes and trillions of edges remains a challenge. Existing approaches propose partitioning across distributed systems or employing single machines with GPU caching techniques during the sampling phase. While the former encounters issues related to maintenance costs and increased latency, the latter faces bottlenecks in data movement, resulting in inefficient resource utilization and suboptimal training. To address the limitations of single-machine techniques, we direct our attention to the sampling stage and introduce a novel approach utilizing the Samsung smartSSD computational storage device. This approach significantly reduces unnecessary data movement overhead and minimizes overall training time. Computational storage devices enable the offloading of computations to their computational units. In our method, we calculate the required sampling subset on its Field programmable gate array (FPGA) of the smartSSD and transfer it to the host DRAM. Our experimental section illustrates that our proposed solution, compared to the baseline MMAP sampling method, achieves a speedup of up to 9 times in terms of sampling time and 5 times in host DRAM utilization.
|
2 |
Towards Structured Intelligence with Deep Graph Neural NetworksLi, Guohao 08 1900 (has links)
Advances in convolutional neural networks and recurrent neural networks have led to significant improvements in learning on regular grid data domains such as images and texts. However, many real-world datasets, for instance, social networks, citation networks, molecules, point clouds, and 3D meshes, do not lie in such a simple grid. Such data is irregular or non-Euclidean in structure and has complex relational information. Graph machine learning, especially Graph Neural Networks (GNNs), provides a potential for processing such irregular data and being capable of modeling the relation between entities, which is leading the machine learning field to a new era. However, previous state-of-the-art (SOTA) GNNs are limited to shallow architectures due to challenging problems such as vanishing gradients, over-fitting, and over-smoothing. Most of the SOTA GNNs are not deeper than 3 or 4 layers, which restricts the representative power of GNNs and makes learning on large-scale graphs ineffective. Aiming to resolve this challenge, this dissertation discusses approaches to building large-scale and efficient graph machine learning models for learning structured representation with applications to engineering and sciences. This work would present how to make GNNs go deep by introducing architectural designs and how to automatically search GNN architectures by novel neural architecture search algorithms.
|
3 |
A Multimodal Graph Convolutional Approach to Predict Genes Associated with Rare Genetic DiseasesSahasrabudhe, Dhruva Shrikrishna 11 September 2020 (has links)
There exist a large number of rare genetic diseases in humans. Our knowledge of the specific gene variants whose presence in the genome of a person predisposes them towards developing a disease, called gene associations, is incomplete. Computational tools which can predict genes which may be associated with a rare disease have great utility in healthcare. However, a majority of existing prediction algorithms require a set of already known "seed genes'' to further discover novel associations for a disease. This drawback becomes more serious for rare genetic diseases, since a large proportion do not have any known gene associations. In this work, we develop an approach for disease-gene association prediction that overcomes the reliance on seed genes. Our approach uses the similarity of the observable biological characteristics of diseases (i.e., phenotypes) along with a global map of direct and indirect human protein interactions, to transfer associations from diseases whose gene associations have been discovered to diseases with no known gene associations. We formulate disease-gene association prediction over a multimodal network of diseases and genes, and develop an approach based on graph convolutional networks. We show how our model design considerations impact prediction performance. We demonstrate that our approach outperforms simpler graph machine learning and traditional machine learning approaches, as well as a competitive network propagation based approach for the task of predicting disease-gene associations. / Master of Science / There exist a large number of rare genetic diseases in humans. Our knowledge of the specific gene variants whose presence in the genome of a person predisposes them towards developing a disease, called gene associations, is incomplete. Computational tools which can predict genes which may be associated with a rare disease have great utility in healthcare. However, a majority of existing prediction algorithms require a set of already known "seed genes'' to further discover novel associations for a disease. This drawback becomes more serious for rare genetic diseases, since a large proportion do not have any known gene associations. In this work, we develop an approach for disease-gene association prediction that overcomes the reliance on seed genes. Our approach uses the similarity of the observable biological characteristics of diseases (i.e. disease phenotypes) along with a global map of direct and indirect human protein interactions, to transfer gene associations from diseases whose gene associations have been discovered, to diseases with no known associations. We implement an approach based on the field of graph machine learning, namely graph convolutional networks, to predict the genes associated with rare genetic diseases. We show how our predictor performs, compared to other approaches, and analyze some of the choices made in the design of the predictor, along with some properties of the outputs of our predictor.
|
4 |
Supervised Inference of Gene Regulatory NetworksSen, Malabika Ashit 09 September 2021 (has links)
A gene regulatory network (GRN) records the interactions among transcription
factors and their target genes. GRNs are useful to study how transcription factors (TFs) control
gene expression as cells transition between states during differentiation and development.
Scientists usually construct GRNs by careful examination and study of the literature. This
process is slow and painstaking and does not scale to large networks. In this thesis, we study
the problem of inferring GRNs automatically from gene expression data. Recent data-driven
approaches to infer GRNs increasingly rely on single-cell level RNA-sequencing (scRNA-seq)
data. Most of these methods rely on unsupervised or association based strategies, which
cannot leverage known regulatory interactions by design. To facilitate supervised learning,
we propose a novel graph convolutional neural network (GCN) based autoencoder to infer
new regulatory edges from a known GRN and scRNA-seq data. As the name suggests, a
GCN-based autoencoder consists of an encoder that learns a low-dimensional embedding
of the nodes (genes) in the input graph (the GRN) through a series of graph convolution
operations and a decoder that aims to reconstruct the original graph as accurately as possible.
We investigate several GCN-based architectures to determine the ideal encoder-decoder
combination for GRN reconstruction. We systematically study the performance of these
and other supervised learning methods on different mouse and human scRNA-seq datasets
for two types of evaluation. We demonstrate that our GCN-based approach substantially
outperforms traditional machine learning approaches. / Master of Science / In multi-cellular living organisms, stem cells differentiate into multiple cell types.
Proteins called transcription factors (TFs) control the activity of genes to effect these transitions.
It is possible to represent these interactions abstractly using a gene regulatory network
(GRN). In a GRN, each node is a TF or a gene and each edge connects a TF to a gene or
TF that it controls. New high-throughput technologies that can measure gene expression
(activity) in individual cells provide rich data that can be used to construct GRNs. In this
thesis, we take advantage of recent advances in the field of machine learning to develop
a new computational method for computationally constructing GRNs. The distinguishing
property of our technique is that it is supervised, i.e., it uses experimentally-known interactions
to infer new regulatory connections. We investigate several variations of this approach
to reconstruct a GRN as close to the original network as possible. We analyze and provide
a rationale for the decisions made in designing, evaluating, and choosing the characteristics
of our predictor. We show that our predictor has a reconstruction accuracy that is superior
to other supervised-learning approaches.
|
5 |
Bullying Detection through Graph Machine Learning : Applying Neo4j’s Unsupervised Graph Learning Techniques to the Friends DatasetEnström, Olof, Eid, Christoffer January 2023 (has links)
In recent years, the pervasive issue of bullying, particularly in academic institutions, has witnessed a surge in attention. This report centers around the utilization of the Friends Dataset and Graph Machine Learning to detect possible instances of bullying in an educational setting. The importance of this research lies in the potential it has to enhance early detection and prevention mechanisms, thereby creating safer environments for students. Leveraging graph theory, Neo4j, Graph Data Science Library, and similarity algorithms, among other tools and methods, we devised an approach for processing and analyzing the dataset. Our method involves data preprocessing, application of similarity and community detection algorithms, and result validation with domain experts. The findings of our research indicate that Graph Machine Learning can be effectively utilized to identify potential bullying scenarios, with a particular focus on discerning community structures and their influence on bullying. Our results, albeit preliminary, represent a promising step towards leveraging technology for bullying detection and prevention.
|
6 |
Dynamic Network Modeling from Temporal Motifs and Attributed Node ActivityGiselle Zeno (16675878) 26 July 2023 (has links)
<p>The most important networks from different domains—such as Computing, Organization, Economic, Social, Academic, and Biology—are networks that change over time. For example, in an organization there are email and collaboration networks (e.g., different people or teams working on a document). Apart from the connectivity of the networks changing over time, they can contain attributes such as the topic of an email or message, contents of a document, or the interests of a person in an academic citation or a social network. Analyzing these dynamic networks can be critical in decision-making processes. For instance, in an organization, getting insight into how people from different teams collaborate, provides important information that can be used to optimize workflows.</p>
<p><br></p>
<p>Network generative models provide a way to study and analyze networks. For example, benchmarking model performance and generalization in tasks like node classification, can be done by evaluating models on synthetic networks generated with varying structure and attribute correlation. In this work, we begin by presenting our systemic study of the impact that graph structure and attribute auto-correlation on the task of node classification using collective inference. This is the first time such an extensive study has been done. We take advantage of a recently developed method that samples attributed networks—although static—with varying network structure jointly with correlated attributes. We find that the graph connectivity that contributes to the network auto-correlation (i.e., the local relationships of nodes) and density have the highest impact on the performance of collective inference methods.</p>
<p><br></p>
<p>Most of the literature to date has focused on static representations of networks, partially due to the difficulty of finding readily-available datasets of dynamic networks. Dynamic network generative models can bridge this gap by generating synthetic graphs similar to observed real-world networks. Given that motifs have been established as building blocks for the structure of real-world networks, modeling them can help to generate the graph structure seen and capture correlations in node connections and activity. Therefore, we continue with a study of motif evolution in <em>dynamic</em> temporal graphs. Our key insight is that motifs rarely change configurations in fast-changing dynamic networks (e.g. wedges intotriangles, and vice-versa), but rather keep reappearing at different times while keeping the same configuration. This finding motivates the generative process of our proposed models, using temporal motifs as building blocks, that generates dynamic graphs with links that appear and disappear over time.</p>
<p><br></p>
<p>Our first proposed model generates dynamic networks based on motif-activity and the roles that nodes play in a motif. For example, a wedge is sampled based on the likelihood of one node having the role of hub with the two other nodes being the spokes. Our model learns all parameters from observed data, with the goal of producing synthetic graphs with similar graph structure and node behavior. We find that using motifs and node roles helps our model generate the more complex structures and the temporal node behavior seen in real-world dynamic networks.</p>
<p><br></p>
<p>After observing that using motif node-roles helps to capture the changing local structure and behavior of nodes, we extend our work to also consider the attributes generated by nodes’ activities. We propose a second generative model for attributed dynamic networks that (i) captures network structure dynamics through temporal motifs, and (ii) extends the structural roles of nodes in motifs to roles that generate content embeddings. Our new proposed model is the first to generate synthetic dynamic networks and sample content embeddings based on motif node roles. To the best of our knowledge, it is the only attributed dynamic network model that can generate <em>new</em> content embeddings—not observed in the input graph, but still similar to that of the input graph. Our results show that modeling the network attributes with higher-order structures (e.g., motifs) improves the quality of the networks generated.</p>
<p><br></p>
<p>The generative models proposed address the difficulty of finding readily-available datasets of dynamic networks—attributed or not. This work will also allow others to: (i) generate networks that they can share without divulging individual’s private data, (ii) benchmark model performance, and (iii) explore model generalization on a broader range of conditions, among other uses. Finally, the evaluation measures proposed will elucidate models, allowing fellow researchers to push forward in these domains.</p>
|
Page generated in 0.1124 seconds