Global ETD Search

1	Towards Machine Learning Enabled Automatic Design of IT-Network Architectures Wåhlin, Lova January 2019 (has links) There are many machine learning techniques that cannot be performed on graph-data. Techniques such as graph embedding, i.e mapping a graph to a vector, can open up a variety of machine learning solutions. This thesis addresses to what extent static graph embedding techniques can capture important characteristics of an IT-architecture graph, with the purpose of embedding the graphs in a common euclidean vector space that can serve as the state space in a reinforcement learning setup. The metric used for evaluating the performance of the embedding is the security of the graph, i.e the time it would take for an unauthorized attacker to penetrate the IT-architecture graph. The algorithms evaluated in this work are the node embedding methods node2vec and gat2vec and the graph embedding method graph2vec. The predictive results of the embeddings are compared with two baseline methods. The results of each of the algorithms mostly display a significant predictive performance improvement compared to the baseline, where the F1 score in some cases is doubled. Indeed, the results indicate that static graph embedding methods can in fact capture some information about the security of an IT-architecture. However, no conclusion can be made whether a static graph embedding is actually the best contender for posing as the state space in a reinforcement learning framework. To make a certain conclusion other options has to be researched, such as dynamic graph embedding methods. / Det är många maskininlärningstekniker som inte kan appliceras på data i form av en graf. Tekniker som graph embedding, med andra ord att mappa en graf till ett vektorrum, can öppna upp för en större variation av maskininlärningslösningar. Det här examensarbetet evaluerar hur väl statiska graph embeddings kan fånga viktiga säkerhetsegenskaper hos en IT-arkitektur som är modellerad som en graf, med syftet att användas i en reinforcement learning algoritm. Dom egenskaper i grafen som används för att validera embedding metoderna är hur lång tid det skulle ta för en obehörig attackerare att penetrera IT-arkitekturen. Algorithmerna som implementeras är node embedding metoderna node2vec och gat2vec, samt graph embedding metoden graph2vec. Dom prediktiva resultaten är jämförda med två basmetoder. Resultaten av alla tre metoderna visar tydliga förbättringar relativt basmetoderna, där F1 värden i några fall uppvisar en fördubbling. Det går alltså att dra slutsatsen att att alla tre metoder kan fånga upp säkerhetsegenskaper i en IT-arkitektur. Dock går det inte att säga att statiska graph embeddings är den bästa lösningen till att representera en graf i en reinforcement learning algoritm, det finns andra komplikationer med statiska metoder, till exempel att embeddings från dessa metoder inte kan generaliseras till data som inte var använd till träning. För att kunna dra en absolut slutsats krävs mer undersökning, till exempel av dynamiska graph embedding metoder. IT-Architecture graph Node Embedding Graph Embedding Reinforcement Learning Machine Learning IT-Arkitektur Node Embedding Graph Embedding Reinforcement learning Maskininlärning Computational Mathematics Beräkningsmatematik
2	Systematic Assessment of Structural Features-Based Graph Embedding Methods with Application to Biomedical Networks Zhu, Xiaoting 04 November 2020 (has links) No description available. Computer Science graph node embedding graph structural analysis link prediction network science biomedical networks
3	Constructing and representing a knowledge graph(KG) for Positive Energy Districts (PEDs) Davari, Mahtab January 2023 (has links) In recent years, knowledge graphs(KGs) have become essential tools for visualizing concepts and retrieving contextual information. However, constructing KGs for new and specialized domains like Positive Energy Districts (PEDs) presents unique challenges, particularly when dealing with unstructured texts and ambiguous concepts from academic articles. This study focuses on various strategies for constructing and inferring KGs, specifically incorporating entities related to PEDs, such as projects, technologies, organizations, and locations. We utilize visualization techniques and node embedding methods to explore the graph's structure and content and apply filtering techniques and t-SNE plots to extract subgraphs based on specific categories or keywords. One of the key contributions is using the longest path method, which allows us to uncover intricate relationships, interconnectedness between entities, critical paths, and hidden patterns within the graph, providing valuable insights into the most significant connections. Additionally, community detection techniques were employed to identify distinct communities within the graph, providing further understanding of the structural organization and clusters of interconnected nodes with shared themes. The paper also presents a detailed evaluation of a question-answering system based on the KG, where the Universal Sentence Encoder was used to convert text into dense vector representations and calculate cosine similarity to find similar sentences. We assess the system's performance through precision and recall analysis and conduct statistical comparisons of graph embeddings, with Node2Vec outperforming DeepWalk in capturing similarities and connections. For edge prediction, logistic regression, focusing on pairs of neighbours that lack a direct connection, was employed to effectively identify potential connections among nodes within the graph. Additionally, probabilistic edge predictions, threshold analysis, and the significance of individual nodes were discussed. Lastly, the advantages and limitations of using existing KGs(Wikidata and DBpedia) versus constructing new ones specifically for PEDs were investigated. It is evident that further research and data enrichment is necessary to address the scarcity of domain-specific information from existing sources. Knowledge graph Positive Energy Districts (PEDs) longest path Questions and Answers Community Detection Node Embedding t-SNE plots Edge Prediction Computer Sciences Datavetenskap (datalogi)
4	Using machine learning to visualize and analyze attack graphs Cottineau, Antoine January 2021 (has links) In recent years, the security of many corporate networks have been compromised by hackers who managed to obtain important information by leveraging the vulnerabilities of those networks. Such attacks can have a strong economic impact and affect the image of the entity whose network has been attacked. Various tools are used by network security analysts to study and improve the security of networks. Attack graphs are among these tools. Attack graphs are graphs that show all the possible chains of exploits an attacker could follow to access an important host on a network. While attack graphs are useful for network security, they may become hard to read because of their size when networks become larger. Previous work tried to deal with this issue by applying simplification algorithms on graphs. Experience shows that even if these algorithms can help improve the visualization of attack graphs, we believe that improvements can be made, especially by relying on Machin Learning (ML) algorithms. Thus, the goal of this thesis is to investigate how ML can help improve the visualization of attack graphs and the security analysis of networks based on their attack graph. To reach this goal, we focus on two main areas. First we used graph clustering which is the process of creating a partition of the nodes based on their position in the graph. This improves visualization by allowing network analysts to focus on a set of related nodes instead of visualizing the whole graph. We also design several metrics for security analysis based on attack graphs. We show that the ML algorithms in both areas. The ML clustering algorithms even produce better clusters than non-ML algorithms with respect to the coverage metric, at the cost of computation time. Moreover, the ML security evaluation algorithms show faster computation times on dense attack graphs than the non-ML baseline, while producing similar results. Finally, a user interface that permits the application of the methods presented in the thesis is also developed, with the goal of making the use of such methods easier by network analysts. / Under de senaste åren har säkerheten för många företagsnätverk äventyrats av hackare som lyckats få fram viktig information genom att utnyttja sårbarheterna i dessa nätverk. Sådana attacker kan ha en stark ekonomisk inverkan och påverka bilden av den enhet vars nätverk har angripits. Olika verktyg användes av nätverkssäkerhetsanalytiker för att studera och förbättra säkerheten i nätverken. Attackgrafer ät bland dessa verktyg. Attackgrafer är diagram som visar alla möjliga kedjor av utnyttjande en angripare kan följa för att komma åt en viktig värd i ett nätverk. Även om attackgrafer är användbara för nätverkssäkerhet, kan de bli svåra att läsa på grund av deras storlek när nätverk blir större. Tidigare arbete försökte hantera detta problem genom att tillämpa förenklingsalgoritmer på grafer. Erfarenheten visar att även om dessa algoritmer kan hjälpa till att förbättra visualiseringen av attackgrafer tror vi att förbättringar kan göras, särskilt genom att förlita sig på Machine Learning (ML) algoritmer. Således är målet med denna avhandling att undersöka hur ML kan hjälpa till att förbättra visualiseringen av attackgrafer och säkerhetsanalys av nätverk baserat på deras attackgraf. För att nå detta mål fokuserar vi på två huvudområden. Först använder vi grafklustering som är processen för att skapa en partition av noderna baserat på deras position i grafen. Detta förbättrar visualiseringen genom att låta nätverksanalytiker fokusera på en uppsättning relaterade noder istället för att visualisera hela grafen. Vi utformar också flera mätvärden för säkerhetsanalys baserat på attackgrafer. Vi visar att ML-algoritmerna är lika effektiva som icke-LM-algoritmer inom båda områdena. Klusteringsalgoritmerna ML producerar till och med bättre kluster än icke-ML-algoritmer med avseende på täckningsvärdet, till kostnaden för beräkningstid. Dessutom visar ML säkerhetsutvärderingsalgoritmerna snabbare beräkningstider på täta attackgrafer än icke-ML baslinjen, samtidigt som de ger liknande resultat. Slutligen utvecklas också ett användargränssnitt som tillåter tillämpning av metoderna som presenteras i avhandlingen, med målet att göra användningen av sådana metoder enklare för nätverksanalytiker. Attack graph Graph visualization Graph clustering Network security Node embedding Attackgraf grafvisualisering grafklustering nätverkssäkerhet inbäddning av noder Computer and Information Sciences Data- och informationsvetenskap
5	Synthetic Graph Generation at Scale : A novel framework for generating large graphs using clustering, generative models and node embeddings / Storskalig generering av syntetiska grafer : En ny arkitektur för att tillverka stora grafer med hjälp av klustring, generativa modeller och nodinbäddningar Hammarstedt, Johan January 2022 (has links) The field of generative graph models has seen increased popularity during recent years as it allows us to model the underlying distribution of a network and thus recreate it. From allowing anonymization of sensitive information in social networks to data augmentation of rare diseases in the brain, the ability to generate synthetic data has multiple applications in various domains. However, most current methods face the bottleneck of trying to generate the entire adjacency matrix and are thus limited to graphs with less than tens of thousands of nodes. In contrast, large real-world graphs like social networks or transaction graphs can extend significantly beyond these boundaries. Furthermore, the current scalable approaches are predominantly based on stochasticity and do not capture local structures and communities. In this paper, we propose Graphwave Edge-Linking CELL or GELCELL, a novel three-step architecture for generating graphs at scale. First, instead of constructing the entire network, GELCELL partitions the data and generates each cluster separately, allowing for efficient and parallelizable training. Then, by encoding the nodes, it trains a classifier to predict the edges between the partitions to patch them together, creating a synthetic version of the original large graph. Although it does suffer from some limitations due to necessary constraints on the cluster sizes, the results showed that GELCELL, given optimized parameters, can produce graphs with reasonable accuracy on all data tested, with the largest having 400 000 nodes and 1 000 000 edges. / Generativa grafmodeller har sett ökad popularitet under de senaste åren eftersom det möjliggör modellering av grafens underliggande distribution, och vi kan på så sätt återskapa liknande kopior. Förmågan att generera syntetisk data har ett flertal applikationsområden i en mängd av områden, allt från att möjligöra anonymisering av känslig data i sociala nätverk till att utöka mängden tillgänglig data av ovanliga hjärnsjukdomar. Dagens metoder har länge varit begränsade till grafer med under tiotusental noder, då dessa inte är tillräckligt skalbara, men grafer som sociala nätverk eller transaktionsgrafer kan sträcka sig långt utöver dessa gränser. Dessutom är de nuvarande skalbara tillvägagångssätten till största delen baserade på stokasticitet och fångar inte lokala strukturer och kluster. I denna rapport föreslår vi ”Graphwave EdgeLinking CELL” eller GELCELL, en trestegsarkitektur för att generera grafer i större skala. Istället för att återskapa hela grafen direkt så partitionerar GELCELL all datat och genererar varje kluster separat, vilket möjliggör både effektiv och parallelliserbar träning. Vi kan sedan koppla samman grafen genom att koda noderna och träna en modell för att prediktera länkarna mellan kluster och återskapa en syntetisk version av originalet. Metoden kräver vissa antaganden gällande max-storleken på dess kluster men är flexibel och kan rymma domänkännedom om en specifik graf i form av informerad parameterinställning. Trots detta visar resultaten på varierade träningsdata att GELCELL, givet optimerade parametrar, är kapabel att genera grafer med godtycklig precision upp till den största beprövade grafen med 400 000 noder och 1 000 000 länkar. Data Anonymization Graph Learning Generative Graph Modeling Graph Clustering Node Embedding Synthetic Data Dataanonymisering Grafinlärning Generativa graf-modeller Graf klustring Länk prediktion Nodinbäddning Syntetisk data Computer and Information Sciences Data- och informationsvetenskap
6	Flight search engine CPU consumption prediction Tao, Zhaopeng January 2021 (has links) The flight search engine is a technology used in the air travel industry. It allows the traveler to search and book for the best flight options, such as the combination of flights while keeping the best services, options, and price. The computation for a flight search query can be very intensive given its parameters and complexity. The project goal is to predict the flight search queries computation cost for a new flight search engine product when dealing with parameters change and optimizations. The problem of flight search cost prediction is a regression problem. We propose to solve the problem by delimiting the problem based on its business logic and meaning. Our problem has data defined as a graph, which is why we have chosen Graph Neural Network. We have investigated multiple pretraining strategies for the evaluation of node embedding concerning a realworld regression task, including using a line graph for the training. The embeddings are used for downstream regression tasks. Our work is based on some stateoftheart Machine Learning, Deep Learning, and Graph Neural Network methods. We conclude that for some business use cases, the predictions are suitable for production use. In addition, the prediction of tree ensemble boosting methods produces negatives predictions which further degrade the R2 score by 4% because of the business meaning. The Deep Neural Network outperformed the most performing Machine Learning methods by 8% to 12% of R2 score. The Deep Neural Network also outperformed Deep Neural Network with pretrained node embedding from the Graph Neural Network methods by 11% to 17% R2 score. The Deep Neural Network achieved 93%, 81%, and 63% R2 score for each task with increasing difficulty. The training time range from 1 hour for Machine Learning models, 2 to 10 hours for Deep Learning models, and 8 to 24 hours for Deep Learning model for tabular data trained end to end with Graph Neural Network layers. The inference time is around 15 minutes. Finally, we found that using Graph Neural Network for the node regression task does not outperform Deep Neural Network. / Flygsökmotor är en teknik som används inom flygresebranschen. Den gör det möjligt för resenären att söka och boka de bästa flygalternativen, t.ex. kombinationer av flygningar med bästa service, alternativ och pris. Beräkningen av en flygsökning kan vara mycket intensiv med tanke på dess parametrar och komplexitet. Projektets mål är att förutsäga beräkningskostnaden för flygsökfrågor för en ny produkt för flygsökmotor när parametrar ändras och optimeringar görs. Problemet med att förutsäga kostnaderna för flygsökning är ett regressionsproblem. Vi föreslår att man löser problemet genom att avgränsa det utifrån dess affärslogik och innebörd. Vårt problem har data som definieras som en graf, vilket är anledningen till att vi har valt Graph Neural Network. Vi har undersökt flera förträningsstrategier för utvärdering av nodinbäddning när det gäller en regressionsuppgift från den verkliga världen, bland annat genom att använda ett linjediagram för träningen. Inbäddningarna används för regressionsuppgifter i efterföljande led. Vårt arbete bygger på några toppmoderna metoder för maskininlärning, djupinlärning och grafiska neurala nätverk. Vi drar slutsatsen att förutsägelserna är lämpliga för produktionsanvändning i vissa Vi drar slutsatsen att förutsägelserna är lämpliga för produktionsanvändning i vissa fall. Dessutom ger förutsägelserna från trädens ensemble av boostingmetoder negativa förutsägelser som ytterligare försämrar R2poängen med 4% på grund av affärsmässiga betydelser. Deep Neural Network överträffade de mest effektiva metoderna för maskininlärning med 812% av R2poängen. Det djupa neurala nätverket överträffade också det djupa neurala nätverket med förtränad node embedding från metoderna för grafiska neurala nätverk med 11 till 17% av R2poängen. Deep Neural Network uppnådde 93, 81 och 63% R2poäng för varje uppgift med stigande svårighetsgrad. Träningstiden varierar från 1 timme för maskininlärningsmodeller, 2 till 10 timmar för djupinlärningsmodeller och 8 till 24 timmar för djupinlärningsmodeller för tabelldata som tränats från början till slut med grafiska neurala nätverkslager. Inferenstiden är cirka 15 minuter. Slutligen fann vi att användningen av Graph Neural Network för uppgiften om regression av noder inte överträffar Deep Neural Network. Flight Search Engine Deep Neural Networks Tabular Data Regression Machine Learning Flight Schedule Embedding Node Embedding Graph Embedding Line Graph Embedding Sökmotor för flygresor Djupa neurala nätverk Tabulära data Regression Maskininlärning Inbäddning av flygschema Inbäddning av noder Inbäddning av grafer Inbäddning av linjediagram Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0719 seconds