• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • Tagged with
  • 10
  • 10
  • 10
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction

Florescu, Corina Andreea 08 1900 (has links)
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
2

Recommending Collaborations Using Link Prediction

Chennupati, Nikhil 27 May 2021 (has links)
No description available.
3

Multiomics Data Integration and Multiplex Graph Neural Network Approaches

Kesimoglu, Ziynet Nesibe 05 1900 (has links)
With increasing data and technology, multiple types of data from the same set of nodes have been generated. Since each data modality contains a unique aspect of the underlying mechanisms, multiple datatypes are integrated. In addition to multiple datatypes, networks are important to store information representing associations between entities such as genes of a protein-protein interaction network and authors of a citation network. Recently, some advanced approaches to graph-structured data leverage node associations and features simultaneously, called Graph Neural Network (GNN), but they have limitations for integrative approaches. The overall aim of this dissertation is to integrate multiple data modalities on graph-structured data to infer some context-specific gene regulation and predict outcomes of interest. To this end, first, we introduce a computational tool named CRINET to infer genome-wide competing endogenous RNA (ceRNA) networks. By integrating multiple data properly, we had a better understanding of gene regulatory circuitry addressing important drawbacks pertaining to ceRNA regulation. We tested CRINET on breast cancer data and found that ceRNA interactions and groups were significantly enriched in the cancer-related genes and processes. CRINET-inferred ceRNA groups supported the studies claiming the relation between immunotherapy and cancer. Second, we present SUPREME, a node classification framework, by comprehensively analyzing multiple data and associations between nodes with graph convolutions on multiple networks. Our results on survival analysis suggested that SUPREME could demystify the characteristics of classes with proper utilization of multiple data and networks. Finally, we introduce an attention-aware fusion approach, called GRAF, which fuses multiple networks and utilizes attention mechanisms on graph-structured data. Utilization of learned node- and association-level attention with network fusion allowed us to prioritize the edges properly, leading to improvement in the prediction results. Given the findings of all three tools and their outperformance over state-of-the-art methods, the proposed dissertation shows the importance of integrating multiple types of data and the exploitation of multiple graph structured data.
4

An unsupervised method for Graph Representation Learning

Ren, Yi January 2022 (has links)
Internet services, such as online shopping and chat apps, have been spreading significantly in recent years, generating substantial amounts of data. These data are precious for machine learning and consist of connections between different entities, such as users and items. These connections contain important information essential for ML models to exploit, and the need to extract this information from graphs gives rise to Graph Representation Learning. By training on these data using Graph Representation Learning methods, hidden information can be obtained, and services can be improved. Initially, the models used for Graph Representation Learning were unsupervised, such as the Deepwalk and Node2vec. These models originated from the field of Natural Language Processing. These models are easy to apply, but their performance is not satisfactory. On the other hand, while supervised models like GNN and GCN have better performance than unsupervised models, they require a huge effort to label the data and finetune the model. Nowadays, the datasets have become larger and more complex, which makes the burden heavier for applying these supervised models. A recent breakthrough in the field of Natural Language Processing may solve the problem. In the paper ‘Attention is all you need’, the authors introduce the Transformer model, which shows excellent performance in NLP. Considering that the field of NLP has many things in common with the GRL and the first supervised models all originated from NLP, it is reasonable to guess whether we can take advantage of the Transformer in improving the performance of the unsupervised model in GRL. Generating embedding for nodes in the graph is one of the significant tasks of GRL. In this thesis, the performance of the Transformer model on generating embedding is tested. Three popular datasets (Cora, Citeseer, Pubmed) are used in training, and the embedding quality is measured through node classification with a linear classification algorithm. Another part of the thesis is to finetune the model to determine the effect of model parameters on embedding accuracy. In this part, comparison experiments are conducted on the dimensions, the number of layers, the sample size, and other parameters. The experiments show that the Transformer model performs better in generating embedding than the original methods, such as the Deepwalk. Compared to supervised methods, it requires less finetuning and less training time. The characteristic of the Transformer model revealed from the experiments shows that it is a good alternative to the baseline model for embedding generation. Improvement may be made on the prepossessing and loss function of the model to get higher performance. / Internettjänster, som onlineshopping och chattappar, har spridits avsevärt de senaste åren och genererat betydande mängder data. Dessa data är värdefulla för maskininlärning och består av kopplingar mellan olika enheter, såsom användare och objekt. Dessa kopplingar innehåller viktig information som är väsentlig för ML-modeller att utnyttja, och behovet av att extrahera denna information från grafer ger upphov till Graph Representation Learning. Genom att träna på dessa data med hjälp av Graph Representation Learning-metoder kan dold information erhållas och tjänster kan förbättras. Till en början var modellerna som användes för Graph Representation Learning oövervakade, såsom Deepwalk och Node2vec. Dessa modeller härstammar från området Natural Language Processing. Dessa modeller är lätta att applicera, men deras prestanda är inte tillfredsställande. Å andra sidan, medan övervakade modeller som GNN och GCN har bättre prestanda än oövervakade modeller, kräver de en enorm ansträngning för att märka data och finjustera modellen. Numera har datamängderna blivit större och mer komplexa, vilket gör bördan tyngre för att tillämpa dessa övervakade modeller. Ett nyligen genomfört genombrott inom området Natural Language Processing kan lösa problemet. I tidningen ‘Attention is all you need’ introducerar författarna Transformer-modellen, som visar utmärkta prestanda i NLP. Med tanke på att området NLP har många saker gemensamt med GRL och att de första övervakade modellerna alla härstammar från NLP, är det rimligt att gissa om vi kan dra fördel av Transformatorn för att förbättra prestandan för den oövervakade modellen i GRL. Att generera inbäddning för noder i grafen är en av GRL:s viktiga uppgifter. I detta examensarbete testas transformatormodellens prestanda för att generera inbäddning. Tre populära datamängder (Cora, Citeseer, Pubmed) används i utbildningen, och inbäddningskvaliteten mäts genom nodklassificering med en linjär klassificeringsalgoritm. En annan del av avhandlingen är att finjustera modellen för att bestämma effekten av modellparametrar på inbäddningsnoggrannheten. I den här delen utförs jämförelseexperiment på dimensionerna, antalet lager, provstorleken och andra parametrar. Experimenten visar att Transformer-modellen presterar bättre när det gäller att generera inbäddning än de ursprungliga metoderna, såsom Deep-walk. Jämfört med övervakade metoder kräver det mindre finjustering och mindre träningstid. Den egenskap hos transformatormodellen som avslöjades från experimenten visar att den är ett bra alternativ till baslinjemodellen för inbäddningsgenerering. Förbättringar kan göras av modellens preposseing- och förlustfunktion för att få högre prestanda.
5

Dynamic Graph Representation Learning on Enterprise Live Video Streaming Events

Stefanidis, Achilleas January 2020 (has links)
Enterprises use live video streaming as a mean of communication. Streaming high-quality video to thousands of devices in a corporate network is not an easy task; the bandwidth requirements often exceed the network capacity. For that matter, Peer-To-Peer (P2P) networks have been proven beneficial, as peers can exchange content efficiently by utilizing the topology of the corporate network. However, such networks are dynamic and their topology might not always be known. In this project we propose ABD, a new dynamic graph representation learning approach, which aims to estimate the bandwidth capacity between peers in a corporate network. The architecture of ABDis adapted to the properties of corporate networks. The model is composed of an attention mechanism and a decoder. The attention mechanism produces node embeddings, while the decoder converts those embeddings into bandwidth predictions. The model aims to capture both the dynamicity and the structure of the dynamic network, using an advanced training process. The performance of ABD is tested with two dynamic graphs which were produced by real corporate networks. Our results show that ABD achieves better results when compared to existing state-of-the-art dynamic graph representation learning models. / Företag använder live video streaming för både intern och extern kommunikation. Strömmning av hög kvalitet video till tusentals tittare i ett företagsnätverk är inte enkelt eftersom bandbreddskraven ofta överstiger kapaciteten på nätverket. För att minska lasten på nätverket har Peer-to-Peer (P2P) nätverk visat sig vara en lösning. Här anpassar sig P2P nätverket efter företagsnätverkets struktur och kan därigenom utbyta video data på ett effektivt sätt. Anpassning till ett företagsnätverk är ett utmanande problem eftersom dom är dynamiska med förändring över tid och kännedom över topologin är inte alltid tillgänglig. I det här projektet föreslår vi en ny lösning, ABD, en dynamisk approach baserat på inlärning av grafrepresentationer. Vi försöker estimera den bandbreddskapacitet som finns mellan två peers eller tittare. Architekturen av ABD anpassar sig till egenskaperna av företagsnätverket. Själva modellen bakom ABD använder en koncentrationsmekanism och en avkodare. Attention mekanismen producerar node embeddings, medan avkodaren konverterar embeddings till estimeringar av bandbredden. Modellen fångar upp dynamiken och strukturen av nätverket med hjälp av en avancerad träningsprocess. Effektiviteten av ABD är testad på två dynamiska nätverksgrafer baserat på data från riktiga företagsnätverk. Enligt våra experiment har ABD bättre resultat när man jämför med andra state-of the-art modeller för inlärning av dynamisk grafrepresentation.
6

Fairness through domain awareness : mitigating popularity bias for music discovery

Salganik, Rebecca 11 1900 (has links)
The last decade has brought with it a wave of innovative technology, shifting the channels through which creative content is created, consumed, and categorized. And, as our interactions with creative multimedia content shift towards online platforms, the sheer quantity of content on these platforms has necessitated the integration of algorithmic guidance in the discovery of these spaces. In this way, the recommendation algorithms that guide users' interactions with various art forms have been cast into the role of gatekeepers and begun to play an increasingly influential role in shaping the creation of artistic content. The work laid out in the following chapters fuses three major areas of research: graph representation learning, music information retrieval, and fairness as applied to the task of music recommendation. In recent years, graph neural networks (GNNs), a powerful new architecture which enables deep learning approaches to be applied to graph or network structures, have proven incredibly influential in the music recommendation domain. In tandem with the striking performance gains that GNNs are able to achieve, many of these systems, have been shown to be strongly influenced by the degree, or number of outgoing edges, of individual nodes. More concretely, recent works have uncovered disparities in the qualities of representations learned by state of the art GNNs between nodes which are strongly and weakly connected. Translating these findings to the sphere of recommender systems, where nodes and edges are used to represent the interactions between users and various items, these disparities in representation that are contingent upon a node's connectivity can be seen as a form of popularity bias. And, indeed, within the broader recommendation community, popularity bias has long been considered an open problem, in which recommender systems begin to favor mainstream content over, potentially more relevant, but niche or novel items. If left unchecked these algorithmic nudged towards previously popular content can create, intensify, and enforce negative cycles that perpetuate disparities in representation on both the user and the creator ends of the content consumption pipeline. Particularly in the recommendation of creative (e.g. musical) content, the downstream effects in these disparities of visibility can have genuine economic consequences for artists from under-represented communities. Thus, the problem of popularity bias is something that must be addressed from both a technical and societal perspective. And, as the influence of recommender systems continues to spread, the effects of this phenomenon only become more spurious, as they begin to have critical downstream effects that shape the larger ecosystems in which art is created. Thus, the broad focus of thesis is the mitigation of popularity bias in music recommendation. In order to tailor our exploration of this issue to the graph domain, we begin by formalizing the relationship between degree fairness and popularity bias. In doing so, we concretely define the notion of popularity, grounding it in the structural principles of an interaction network, and enabling us to design objectives that can mitigate the effects of popularity on representation learning. In our first work, we focus on understanding the effects of sampling on degree fairness in uni-partite graphs. The purpose of this work is to lay the foundation for the graph neural network model which will underlie our music recommender system. We then build off this first work by extending the initial fairness framework to be compatible with bi-partite graphs and applying it to the music domain. The motivation of this work is rooted in the notion of discovery, or the idea that users engage with algorithmic curation in order to find content that is both novel and relevant to their artistic tastes. We present the intrinsic relationship between discovery objectives and the presence of popularity bias, explaining that the presence of popularity bias can blind a system to the musical qualities that underpin the underlying needs of music listening. As we will explain in later sections, one of the key elements of this work is our ability to ground our fairness notion in the musical domain. Thus, we propose a domain-aware, individual fairness-based approach which addresses popularity bias in graph neural network (GNNs) based recommender systems. In order to facilitate this domain awareness, we perform extensive dataset augmentation, taking two state of the art music recommendation datasets and augmenting them with rich multi-modal node-level features. Finally, we ground our evaluation in the cold start setting, showing the importance of inductive methodologies in the music space. / La dernière décennie a apporté avec elle une vague de technologies innovantes, modifiant la manière dont le contenu créatif est créé, consommé et catégorisé. Et, à mesure que nos interactions avec les contenus multimédias créatifs se déplacent vers les plateformes en ligne, la quantité de contenu sur ces plateformes a nécessité l’intégration d’un guidage algorithmique dans la découverte de ces espaces. De cette façon, les algorithmes de recommandation qui guident les interactions des utilisateurs avec diverses formes d’art ont été jetés dans le rôle de gardiens et ont commencé à jouer un rôle de plus en plus influent dans l’élaboration de la création de contenu artistique. Le travail présenté dans les chapitres suivants fusionne trois grands domaines de recherche : l’apprentissage de la représentation graphique, la recherche d’informations musicales et l’équité appliquée à la tâche de recommandation musicale. Alors que l’influence des systèmes de recommandation continue de s’étendre et de s’intensifier, il est crucial de prendre en compte les effets en aval que les choix de conception peuvent avoir sur l’écosystème plus large de la création artistique. Ces dernières années, l’intégration des réseaux sociaux dans la tâche de recommandation musicale a donné naissance aux réseaux neuronaux de graphes (GNN), une nouvelle architecture capable de faire des prédictions sur les structures de graphes. Parallèlement aux gains miraculeux que les GNN sont capables de réaliser, bon nombre de ces systèmes peuvent également être la proie de biais de popularité, les forçant à privilégier le contenu grand public par rapport à des éléments potentiellement plus pertinents, mais de niche ou nouveaux. S’il n’est pas maîtrisé, ce cycle négatif peut perpétuer les disparités de représentation entre la musique d’artistes, de genres ou de populations minoritaires. Et, ce faisant, les disparités dans la visibilité des éléments peuvent entraîner des problèmes à la fois du point de vue des performances et de la société. L’objectif de la thèse est l’atténuation du biais de popularité. Premièrement, le travail formalise les liens entre l’équité individuelle et la présence d’un biais de popularité parmi les contenus créatifs. Ensuite, nous étendons un cadre d’équité individuelle, en l’appliquant au domaine de la recommandation musicale. Le coeur de cette thèse s’articule autour de la proposition d’une approche basée sur l’équité individuelle et sensible au domaine qui traite le biais de popularité dans les systèmes de recommandation basés sur les réseaux de 5 neurones graphiques (GNN). L’un des éléments clés de ce travail est notre capacité à ancrer notre notion d’équité dans le domaine musical. Afin de faciliter cette prise de conscience du domaine, nous effectuons une augmentation étendue des ensembles de données, en prenant deux ensembles de données de recommandation musicale à la pointe de la technologie et en les augmentant avec de riches fonctionnalités multimodales au niveau des noeuds. Enfin, nous fondons notre évaluation sur le démarrage à froid, montrant l’importance des méthodologies inductives dans l’espace musical.
7

Intersecting Graph Representation Learning and Cell Profiling : A Novel Approach to Analyzing Complex Biomedical Data

Chamyani, Nima January 2023 (has links)
In recent biomedical research, graph representation learning and cell profiling techniques have emerged as transformative tools for analyzing high-dimensional biological data. The integration of these methods, as investigated in this study, has facilitated an enhanced understanding of complex biological systems, consequently improving drug discovery. The research aimed to decipher connections between chemical structures and cellular phenotypes while incorporating other biological information like proteins and pathways into the workflow. To achieve this, machine learning models' efficacy was examined for classification and regression tasks. The newly proposed graph-level and bio-graph integrative predictors were compared with traditional models. Results demonstrated their potential, particularly in classification tasks. Moreover, the topology of the COVID-19 BioGraph was analyzed, revealing the complex interconnections between chemicals, proteins, and biological pathways. By combining network analysis, graph representation learning, and statistical methods, the study was able to predict active chemical combinations within inactive compounds, thereby exhibiting significant potential for further investigations. Graph-based generative models were also used for molecule generation opening up further research avenues in finding lead compounds. In conclusion, this study underlines the potential of combining graph representation learning and cell profiling techniques in advancing biomedical research in drug repurposing and drug combination. This integration provides a better understanding of complex biological systems, assists in identifying therapeutic targets, and contributes to optimizing molecule generation for drug discovery. Future investigations should optimize these models and validate the drug combination discovery approach. As these techniques continue to evolve, they hold the potential to significantly impact the future of drug screening, drug repurposing, and drug combinations.
8

Reasoning with structure : graph neural networks algorithms and applications

Deac, Andreea-Ioana 08 1900 (has links)
L’avènement de l'apprentissage profond a permis à l'apprentissage automatique d’exceller dans le traitement d'images et de texte. Donnant lieu à de nombreux succès dans les domaines d’applications tels que la vision par ordinateur ou le traitement du langage naturel. Cependant, il demeure un grand nombre de problèmes d’intérêt dont les données d’entrées ne peuvent être exprimées sous l’un de ces deux formats sans perte d'informations potentiellement cruciales pour leur résolution. C’est dans l’optique de répondre à ce besoin qu’a été développée la branche de l'apprentissage profond géométrique (GDL), qui s’intéresse aux espaces de représentations plus générales, mieux adaptées aux données dont la structure sous-jacente ne correspond pas au format de chaîne de caractères unidimensionnel (texte) ou bidimensionnel (images). Dans cette thèse, nous nous concentrerons plus particulièrement sur les graphes. Les graphes sont des structures de données omniprésentes, sous-jacentes à pratiquement toutes les tâches d'intérêt, y compris celles portant sur les données naturelles (par exemple les molécules), les relations entre entités (par exemple les réseaux de transport et les placements de puces), ou encore la liaison de concepts dans les processus de raisonnement (par exemple les algorithmes et autres constructions théoriques). Alors que les architectures modernes de réseaux de neurones de graphes (GNNs) dits expressifs peuvent obtenir des résultats impressionnants sur des benchmarks comme susmentionnés, leur application pratique est toujours en proie à de nombreux problèmes et lacunes, que cette thèse abordera. Les considérations issues de ces applications préparerons le terrain pour les chapitres suivants, qui se concentreront sur la résolution des limites des réseaux de neurones de graphes en proposant de nouveaux algorithmes d'apprentissage de graphes. Tout d'abord, nous porterons notre attention sur l'amélioration des réseaux de neurones de graphes pour les données qui nécessitent des interactions à longue portée, en construisant des modèles généraux pour compléter leur graphe de calcul. Viennent ensuite les réseaux de neurones de graphes pour les données hétérophiles, où les arêtes ont tendance à connecter des nœuds de différentes classes; dans ce cas, nous proposerons une modification particulière du graphe de calcul destinée à améliorer l'homophilie atténue le problème. Dans un troisième temps, nous tirerons parti d'une caractéristique avantageuse des réseaux de neurones de graphes - leur alignement avec la programmation dynamique. Elle permet aux réseaux de neurones de graphes d'exécuter des algorithmes, sur la base desquels nous proposons une nouvelle classe de planificateurs implicites pour la prise de décision. Enfin, nous capitalisons sur l'utilité de l'apprentissage profond géométrique dans l'apprentissage par renforcement et l'étendrons au-delà des GNNs, en tirant parti des réseaux de neurones à rotation équivariante dans les agents basés sur des modèles. / Since the deep learning revolution, machine learning has excelled at tasks based on images and text, many successes being possible under the umbrella of the computer vision and natural language processing fields. However, much remains that cannot be expressed in these forms without losing information. For these cases, the field of geometric deep learning was developed, covering the space of more general representations, for data whose underlying structure doesn't match the single-dimensional string of characters (text) or 2-D shape (images) format. In this thesis, I will particularly focus on graphs. Graphs are ubiquitous data structures underlying virtually all tasks of interest, including natural inputs such as molecules, entity relations for example transportation networks and chip placements, or concept linking in reasoning processes, including algorithms and other theoretical constructs. While modern expressive graph neural network architectures can achieve impressive results on benchmarks like these, their practical application is still plagued with many issues and shortcomings, which this thesis will address. The considerations from these applications will set the scene for the following chapters, which focus on tackling the limitations of graph neural networks by proposing new graph learning algorithms. Firstly, I focus on improving graph neural networks for data that requires long-range interactions by building general templates to complement their computation graph. This is followed by graph neural networks for heterophilic data, where the edges tend to connect nodes from different classes; in this case, a specialised modification of the computation graph meant to improve homophily alleviates the problem. In the third article, I leverage a strength of graph neural networks -- their alignment with dynamic programming. This enables graph neural networks to execute algorithms, based on which I propose a new class of implicit planners for decision making. Lastly, I capitalise on the utility of geometric deep learning in reinforcement learning and extend it beyond GNNs, leveraging rotation-equivariant neural networks in model-based agents.
9

On Higher Order Graph Representation Learning

Balasubramaniam Srinivasan (12463038) 26 April 2022 (has links)
<p>Research on graph representation learning (GRL) has made major strides over the past decade, with widespread applications in domains such as e-commerce, personalization, fraud & abuse, life sciences, and social network analysis. Despite its widespread success, fundamental questions on practices employed in modern day GRL have remained unanswered. Unraveling and advancing two such fundamental questions on the practices in modern day GRL forms the overarching theme of my thesis.</p> <p>The first part of my thesis deals with the mathematical foundations of GRL. GRL is used to solve tasks such as node classification, link prediction, clustering, graph classification, and so on, albeit with seemingly different frameworks (e.g. Graph neural networks for node/graph classification, (implicit) matrix factorization for link prediction/ clustering, etc.). The existence of very distinct frameworks for different graph tasks has puzzled researchers and practitioners alike. In my thesis, using group theory, I provide a theoretical blueprint that connects these seemingly different frameworks, bridging methods like matrix factorization and graph neural networks. With this renewed understanding, I then provide guidelines to better realize the full capabilities of these methods in a multitude of tasks.</p> <p>The second part of my thesis deals with cases where modeling real-world objects as a graph is an oversimplified description of the underlying data. Specifically, I look at two such objects (i) modeling hypergraphs (where edges encompass two or more vertices) and (ii) using GRL for predicting protein properties. Towards (i) hypergraphs, I develop a hypergraph neural network which takes advantage of the inherent sparsity of real world hypergraphs, without unduly sacrificing on its ability to distinguish non isomorphic hypergraphs. The designed hypergraph neural network is then leveraged to learn expressive representations of hyperedges for two tasks, namely hyperedge classification and hyperedge expansion. Experiments show that using our network results in improved performance over the current approach of converting the hypergraph into a dyadic graph and using (dyadic) GRL frameworks. Towards (ii) proteins, I introduce the concept of conditional invariances and leverage it to model the inherent flexibility present in proteins. Using conditional invariances, I provide a new framework for GRL which can capture protein-dependent conformations and ensures that all viable conformers of a protein obtain the same representation. Experiments show that endowing existing GRL models with my framework shows noticeable improvements on multiple different protein datasets and tasks.</p>
10

Taxonomy of datasets in graph learning : a data-driven approach to improve GNN benchmarking

Cantürk, Semih 12 1900 (has links)
The core research of this thesis, mostly comprising chapter four, has been accepted to the Learning on Graphs (LoG) 2022 conference for a spotlight presentation as a standalone paper, under the title "Taxonomy of Benchmarks in Graph Representation Learning", and is to be published in the Proceedings of Machine Learning Research (PMLR) series. As a main author of the paper, my specific contributions to this paper cover problem formulation, design and implementation of our taxonomy framework and experimental pipeline, collation of our results and of course the writing of the article. / L'apprentissage profond sur les graphes a atteint des niveaux de succès sans précédent ces dernières années grâce aux réseaux de neurones de graphes (GNN), des architectures de réseaux de neurones spécialisées qui ont sans équivoque surpassé les approches antérieurs d'apprentissage définies sur des graphes. Les GNN étendent le succès des réseaux de neurones aux données structurées en graphes en tenant compte de leur géométrie intrinsèque. Bien que des recherches approfondies aient été effectuées sur le développement de GNN avec des performances supérieures à celles des modèles références d'apprentissage de représentation graphique, les procédures d'analyse comparative actuelles sont insuffisantes pour fournir des évaluations justes et efficaces des modèles GNN. Le problème peut-être le plus répandu et en même temps le moins compris en ce qui concerne l'analyse comparative des graphiques est la "couverture de domaine": malgré le nombre croissant d'ensembles de données graphiques disponibles, la plupart d'entre eux ne fournissent pas d'informations supplémentaires et au contraire renforcent les biais potentiellement nuisibles dans le développement d’un modèle GNN. Ce problème provient d'un manque de compréhension en ce qui concerne les aspects d'un modèle donné qui sont sondés par les ensembles de données de graphes. Par exemple, dans quelle mesure testent-ils la capacité d'un modèle à tirer parti de la structure du graphe par rapport aux fonctionnalités des nœuds? Ici, nous développons une approche fondée sur des principes pour taxonomiser les ensembles de données d'analyse comparative selon un "profil de sensibilité" qui est basé sur la quantité de changement de performance du GNN en raison d'une collection de perturbations graphiques. Notre analyse basée sur les données permet de mieux comprendre quelles caractéristiques des données de référence sont exploitées par les GNN. Par conséquent, notre taxonomie peut aider à la sélection et au développement de repères graphiques adéquats et à une évaluation mieux informée des futures méthodes GNN. Enfin, notre approche et notre implémentation dans le package GTaxoGym (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) sont extensibles à plusieurs types de tâches de prédiction de graphes et à des futurs ensembles de données. / Deep learning on graphs has attained unprecedented levels of success in recent years thanks to Graph Neural Networks (GNNs), specialized neural network architectures that have unequivocally surpassed prior graph learning approaches. GNNs extend the success of neural networks to graph-structured data by accounting for their intrinsic geometry. While extensive research has been done on developing GNNs with superior performance according to a collection of graph representation learning benchmarks, current benchmarking procedures are insufficient to provide fair and effective evaluations of GNN models. Perhaps the most prevalent and at the same time least understood problem with respect to graph benchmarking is "domain coverage": Despite the growing number of available graph datasets, most of them do not provide additional insights and on the contrary reinforce potentially harmful biases in GNN model development. This problem stems from a lack of understanding with respect to what aspects of a given model are probed by graph datasets. For example, to what extent do they test the ability of a model to leverage graph structure vs. node features? Here, we develop a principled approach to taxonomize benchmarking datasets according to a "sensitivity profile" that is based on how much GNN performance changes due to a collection of graph perturbations. Our data-driven analysis provides a deeper understanding of which benchmarking data characteristics are leveraged by GNNs. Consequently, our taxonomy can aid in selection and development of adequate graph benchmarks, and better informed evaluation of future GNN methods. Finally, our approach and implementation in the GTaxoGym package (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) are extendable to multiple graph prediction task types and future datasets.

Page generated in 0.5874 seconds