Spelling suggestions: "subject:"graph clustering"" "subject:"raph clustering""
31 |
Using machine learning to visualize and analyze attack graphsCottineau, Antoine January 2021 (has links)
In recent years, the security of many corporate networks have been compromised by hackers who managed to obtain important information by leveraging the vulnerabilities of those networks. Such attacks can have a strong economic impact and affect the image of the entity whose network has been attacked. Various tools are used by network security analysts to study and improve the security of networks. Attack graphs are among these tools. Attack graphs are graphs that show all the possible chains of exploits an attacker could follow to access an important host on a network. While attack graphs are useful for network security, they may become hard to read because of their size when networks become larger. Previous work tried to deal with this issue by applying simplification algorithms on graphs. Experience shows that even if these algorithms can help improve the visualization of attack graphs, we believe that improvements can be made, especially by relying on Machin Learning (ML) algorithms. Thus, the goal of this thesis is to investigate how ML can help improve the visualization of attack graphs and the security analysis of networks based on their attack graph. To reach this goal, we focus on two main areas. First we used graph clustering which is the process of creating a partition of the nodes based on their position in the graph. This improves visualization by allowing network analysts to focus on a set of related nodes instead of visualizing the whole graph. We also design several metrics for security analysis based on attack graphs. We show that the ML algorithms in both areas. The ML clustering algorithms even produce better clusters than non-ML algorithms with respect to the coverage metric, at the cost of computation time. Moreover, the ML security evaluation algorithms show faster computation times on dense attack graphs than the non-ML baseline, while producing similar results. Finally, a user interface that permits the application of the methods presented in the thesis is also developed, with the goal of making the use of such methods easier by network analysts. / Under de senaste åren har säkerheten för många företagsnätverk äventyrats av hackare som lyckats få fram viktig information genom att utnyttja sårbarheterna i dessa nätverk. Sådana attacker kan ha en stark ekonomisk inverkan och påverka bilden av den enhet vars nätverk har angripits. Olika verktyg användes av nätverkssäkerhetsanalytiker för att studera och förbättra säkerheten i nätverken. Attackgrafer ät bland dessa verktyg. Attackgrafer är diagram som visar alla möjliga kedjor av utnyttjande en angripare kan följa för att komma åt en viktig värd i ett nätverk. Även om attackgrafer är användbara för nätverkssäkerhet, kan de bli svåra att läsa på grund av deras storlek när nätverk blir större. Tidigare arbete försökte hantera detta problem genom att tillämpa förenklingsalgoritmer på grafer. Erfarenheten visar att även om dessa algoritmer kan hjälpa till att förbättra visualiseringen av attackgrafer tror vi att förbättringar kan göras, särskilt genom att förlita sig på Machine Learning (ML) algoritmer. Således är målet med denna avhandling att undersöka hur ML kan hjälpa till att förbättra visualiseringen av attackgrafer och säkerhetsanalys av nätverk baserat på deras attackgraf. För att nå detta mål fokuserar vi på två huvudområden. Först använder vi grafklustering som är processen för att skapa en partition av noderna baserat på deras position i grafen. Detta förbättrar visualiseringen genom att låta nätverksanalytiker fokusera på en uppsättning relaterade noder istället för att visualisera hela grafen. Vi utformar också flera mätvärden för säkerhetsanalys baserat på attackgrafer. Vi visar att ML-algoritmerna är lika effektiva som icke-LM-algoritmer inom båda områdena. Klusteringsalgoritmerna ML producerar till och med bättre kluster än icke-ML-algoritmer med avseende på täckningsvärdet, till kostnaden för beräkningstid. Dessutom visar ML säkerhetsutvärderingsalgoritmerna snabbare beräkningstider på täta attackgrafer än icke-ML baslinjen, samtidigt som de ger liknande resultat. Slutligen utvecklas också ett användargränssnitt som tillåter tillämpning av metoderna som presenteras i avhandlingen, med målet att göra användningen av sådana metoder enklare för nätverksanalytiker.
|
32 |
Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images / Apport des modèles graphiques à l'analyse et à l'indexation d'images de documentsLuqman, Muhammad Muzzamil 02 March 2012 (has links)
Cette thèse aborde le problème du manque de performance des outils exploitant des représentationsà base de graphes en reconnaissance des formes. Nous proposons de contribuer aux nouvellesméthodes proposant de tirer partie, à la fois, de la richesse des méthodes structurelles et de la rapidité des méthodes de reconnaissance de formes statistiques. Deux principales contributions sontprésentées dans ce manuscrit. La première correspond à la proposition d'une nouvelle méthode deprojection explicite de graphes procédant par analyse multi-facettes des graphes. Cette méthodeeffectue une caractérisation des graphes suivant différents niveaux qui correspondent, selon nous,aux point-clés des représentations à base de graphes. Il s'agit de capturer l'information portéepar un graphe au niveau global, au niveau structure et au niveau local ou élémentaire. Ces informationscapturées sont encapsulés dans un vecteur de caractéristiques numériques employantdes histogrammes flous. La méthode proposée utilise, de plus, un mécanisme d'apprentissage nonsupervisée pour adapter automatiquement ses paramètres en fonction de la base de graphes àtraiter sans nécessité de phase d'apprentissage préalable. La deuxième contribution correspondà la mise en place d'une architecture pour l'indexation de masses de graphes afin de permettre,par la suite, la recherche de sous-graphes présents dans cette base. Cette architecture utilise laméthode précédente de projection explicite de graphes appliquée sur toutes les cliques d'ordre 2pouvant être extraites des graphes présents dans la base à indexer afin de pouvoir les classifier.Cette classification permet de constituer l'index qui sert de base à la description des graphes etdonc à leur indexation en ne nécessitant aucune base d'apprentissage pré-étiquetées. La méthodeproposée est applicable à de nombreux domaines, apportant la souplesse d'un système de requêtepar l'exemple et la granularité des techniques d'extraction ciblée (focused retrieval). / This thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval.
|
33 |
New PDE models for imaging problems and applicationsCalatroni, Luca January 2016 (has links)
Variational methods and Partial Differential Equations (PDEs) have been extensively employed for the mathematical formulation of a myriad of problems describing physical phenomena such as heat propagation, thermodynamic transformations and many more. In imaging, PDEs following variational principles are often considered. In their general form these models combine a regularisation and a data fitting term, balancing one against the other appropriately. Total variation (TV) regularisation is often used due to its edgepreserving and smoothing properties. In this thesis, we focus on the design of TV-based models for several different applications. We start considering PDE models encoding higher-order derivatives to overcome wellknown TV reconstruction drawbacks. Due to their high differential order and nonlinear nature, the computation of the numerical solution of these equations is often challenging. In this thesis, we propose directional splitting techniques and use Newton-type methods that despite these numerical hurdles render reliable and efficient computational schemes. Next, we discuss the problem of choosing the appropriate data fitting term in the case when multiple noise statistics in the data are present due, for instance, to different acquisition and transmission problems. We propose a novel variational model which encodes appropriately and consistently the different noise distributions in this case. Balancing the effect of the regularisation against the data fitting is also crucial. For this sake, we consider a learning approach which estimates the optimal ratio between the two by using training sets of examples via bilevel optimisation. Numerically, we use a combination of SemiSmooth (SSN) and quasi-Newton methods to solve the problem efficiently. Finally, we consider TV-based models in the framework of graphs for image segmentation problems. Here, spectral properties combined with matrix completion techniques are needed to overcome the computational limitations due to the large amount of image data. Further, a semi-supervised technique for the measurement of the segmented region by means of the Hough transform is proposed.
|
34 |
Synthetic Graph Generation at Scale : A novel framework for generating large graphs using clustering, generative models and node embeddings / Storskalig generering av syntetiska grafer : En ny arkitektur för att tillverka stora grafer med hjälp av klustring, generativa modeller och nodinbäddningarHammarstedt, Johan January 2022 (has links)
The field of generative graph models has seen increased popularity during recent years as it allows us to model the underlying distribution of a network and thus recreate it. From allowing anonymization of sensitive information in social networks to data augmentation of rare diseases in the brain, the ability to generate synthetic data has multiple applications in various domains. However, most current methods face the bottleneck of trying to generate the entire adjacency matrix and are thus limited to graphs with less than tens of thousands of nodes. In contrast, large real-world graphs like social networks or transaction graphs can extend significantly beyond these boundaries. Furthermore, the current scalable approaches are predominantly based on stochasticity and do not capture local structures and communities. In this paper, we propose Graphwave Edge-Linking CELL or GELCELL, a novel three-step architecture for generating graphs at scale. First, instead of constructing the entire network, GELCELL partitions the data and generates each cluster separately, allowing for efficient and parallelizable training. Then, by encoding the nodes, it trains a classifier to predict the edges between the partitions to patch them together, creating a synthetic version of the original large graph. Although it does suffer from some limitations due to necessary constraints on the cluster sizes, the results showed that GELCELL, given optimized parameters, can produce graphs with reasonable accuracy on all data tested, with the largest having 400 000 nodes and 1 000 000 edges. / Generativa grafmodeller har sett ökad popularitet under de senaste åren eftersom det möjliggör modellering av grafens underliggande distribution, och vi kan på så sätt återskapa liknande kopior. Förmågan att generera syntetisk data har ett flertal applikationsområden i en mängd av områden, allt från att möjligöra anonymisering av känslig data i sociala nätverk till att utöka mängden tillgänglig data av ovanliga hjärnsjukdomar. Dagens metoder har länge varit begränsade till grafer med under tiotusental noder, då dessa inte är tillräckligt skalbara, men grafer som sociala nätverk eller transaktionsgrafer kan sträcka sig långt utöver dessa gränser. Dessutom är de nuvarande skalbara tillvägagångssätten till största delen baserade på stokasticitet och fångar inte lokala strukturer och kluster. I denna rapport föreslår vi ”Graphwave EdgeLinking CELL” eller GELCELL, en trestegsarkitektur för att generera grafer i större skala. Istället för att återskapa hela grafen direkt så partitionerar GELCELL all datat och genererar varje kluster separat, vilket möjliggör både effektiv och parallelliserbar träning. Vi kan sedan koppla samman grafen genom att koda noderna och träna en modell för att prediktera länkarna mellan kluster och återskapa en syntetisk version av originalet. Metoden kräver vissa antaganden gällande max-storleken på dess kluster men är flexibel och kan rymma domänkännedom om en specifik graf i form av informerad parameterinställning. Trots detta visar resultaten på varierade träningsdata att GELCELL, givet optimerade parametrar, är kapabel att genera grafer med godtycklig precision upp till den största beprövade grafen med 400 000 noder och 1 000 000 länkar.
|
35 |
Exploring Graph Neural Networks for Clustering and ClassificationTahabi, Fattah Muhammad 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Graph Neural Networks (GNNs) have become excessively popular and prominent deep learning techniques to analyze structural graph data for their ability to solve complex real-world problems. Because graphs provide an efficient approach to contriving abstract hypothetical concepts, modern research overcomes the limitations of classical graph theory, requiring prior knowledge of the graph structure before employing traditional algorithms. GNNs, an impressive framework for representation learning of graphs, have already produced many state-of-the-art techniques to solve node classification, link prediction, and graph classification tasks. GNNs can learn meaningful representations of graphs incorporating topological structure, node attributes, and neighborhood aggregation to solve supervised, semi-supervised, and unsupervised graph-based problems. In this study, the usefulness of GNNs has been analyzed primarily from two aspects - clustering and classification. We focus on these two techniques, as they are the most popular strategies in data mining to discern collected data and employ predictive analysis.
|
36 |
EXPLORING GRAPH NEURAL NETWORKS FOR CLUSTERING AND CLASSIFICATIONFattah Muhammad Tahabi (14160375) 03 February 2023 (has links)
<p><strong>Graph Neural Networks</strong> (GNNs) have become excessively popular and prominent deep learning techniques to analyze structural graph data for their ability to solve complex real-world problems. Because graphs provide an efficient approach to contriving abstract hypothetical concepts, modern research overcomes the limitations of classical graph theory, requiring prior knowledge of the graph structure before employing traditional algorithms. GNNs, an impressive framework for representation learning of graphs, have already produced many state-of-the-art techniques to solve node classification, link prediction, and graph classification tasks. GNNs can learn meaningful representations of graphs incorporating topological structure, node attributes, and neighborhood aggregation to solve supervised, semi-supervised, and unsupervised graph-based problems. In this study, the usefulness of GNNs has been analyzed primarily from two aspects - <strong>clustering and classification</strong>. We focus on these two techniques, as they are the most popular strategies in data mining to discern collected data and employ predictive analysis.</p>
|
Page generated in 0.1031 seconds