• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Real-time Anomaly Detection on Financial Data

Martignano, Anna January 2020 (has links)
This work presents an investigation of tailoring Network Representation Learning (NRL) for an application in the Financial Industry. NRL approaches are data-driven models that learn how to encode graph structures into low-dimensional vector spaces, which can be further exploited by downstream Machine Learning applications. They can potentially bring a lot of benefits in the Financial Industry since they extract in an automatic way features that can provide useful input regarding graph structures, called embeddings. Financial transactions can be represented as a network, and through NRL, it is possible to extract embeddings that reflect the intrinsic inter-connected nature of economic relationships. Such embeddings can be used for several purposes, among which Anomaly Detection to fight financial crime.This work provides a qualitative analysis over state-of-the-art NRL models, which identifies Graph Convolutional Network (ConvGNN) as the most suitable category of approaches for Financial Industry but with a certain need for further improvement. Financial Industry poses additional challenges when modelling a NRL solution. Despite the need of having a scalable solution to handle real-world graph with considerable dimensions, it is necessary to take into consideration several characteristics: transactions graphs are inherently dynamic since every day new transactions are executed and nodes can be heterogeneous. Besides, everything is further complicated by the need to have updated information in (near) real-time due to the sensitivity of the application domain. For these reasons, GraphSAGE has been considered as a base for the experiments, which is an inductive ConvGNN model. Two variants of GraphSAGE are presented: a dynamic variant whose weights evolve accordingly with the input sequence of graph snapshots, and a variant specifically meant to handle bipartite graphs. These variants have been evaluated by applying them to real-world data and leveraging the generated embeddings to perform Anomaly Detection. The experiments demonstrate that leveraging these variants leads toimagecomparable results with other state-of-the-art approaches, but having the advantage of being suitable to handle real-world financial data sets. / Detta arbete presenterar en undersökning av tillämpningar av Network Representation Learning (NRL) inom den finansiella industrin. Metoder inom NRL möjliggör datadriven kondensering av grafstrukturer till lågdimensionella och lätthanterliga vektorer.Dessa vektorer kan sedan användas i andra maskininlärningsuppgifter. Närmare bestämt, kan metoder inom NRL underlätta hantering av och informantionsutvinning ur beräkningsintensiva och storskaliga grafer inom den finansiella sektorn, till exempel avvikelsehantering bland finansiella transaktioner. Arbetet med data av denna typ försvåras av det faktum att transaktionsgrafer är dynamiska och i konstant förändring. Utöver detta kan noderna, dvs transaktionspunkterna, vara vitt skilda eller med andra ord härstamma från olika fördelningar.I detta arbete har Graph Convolutional Network (ConvGNN) ansetts till den mest lämpliga lösningen för nämnda tillämpningar riktade mot upptäckt av avvikelser i transaktioner. GraphSAGE har använts som utgångspunkt för experimenten i två olika varianter: en dynamisk version där vikterna uppdateras allteftersom nya transaktionssekvenser matas in, och en variant avsedd särskilt för bipartita (tvådelade) grafer. Dessa varianter har utvärderats genom användning av faktiska datamängder med avvikelsehantering som slutmål.
2

Dynamic Graph Embedding on Event Streams with Apache Flink

Perini, Massimo January 2019 (has links)
Graphs are often considered an excellent way of modeling complex real-world problems since they allow to capture relationships between items. Because of their ubiquity, graph embedding techniques have occupied research groups, seeking how vertices can be encoded into a low-dimensional latent space, useful to then perform machine learning. Recently Graph Neural Networks (GNN) have dominated the space of embeddings generation due to their inherent ability to encode latent node dependencies. Moreover, the newly introduced Inductive Graph Neural Networks gained much popularity for inductively learning and representing node embeddings through neighborhood aggregate measures. Even when an entirely new node, unseen during training, appears in the graph, it can still be properly represented by its neighboring nodes. Although this approach appears suitable for dynamic graphs, available systems and training methodologies are agnostic of dynamicity and solely rely on re-processing full graph snapshots in batches, an approach that has been criticized for its high computational costs. This work provides a thorough solution to this particular problem via an efficient prioritybased method for selecting rehearsed samples that guarantees low complexity and high accuracy. Finally, a data-parallel inference method has been evaluated at scale using Apache Flink, a data stream processor for real-time predictions on high volume graph data streams. / Molti problemi nel mondo reale possono essere rappresentati come grafi poichè queste strutture dati consentono di modellare relazioni tra elementi. A causa del loro vasto uso, molti gruppi di ricerca hanno tentato di rappresentare i vertici in uno spazio a bassa dimensione, utile per poi poter utilizzare tecniche di apprendimento automatico. Le reti neurali per grafi sono state ampiamente utilizzate per via della loro capacità di codificare dipendenze tra vertici. Le reti neurali induttive recentemente introdotte, inoltre, hanno guadagnato popolarità poichè consentono di generare rappresentazioni di vertici aggregando altri vertici. In questo modo anche un nodo completamente nuovo può comunque essere rappresentato utilizzando i suoi nodi vicini. Sebbene questo approccio sia adatto per grafici dinamici, i sistemi ad oggi disponibili e gli algoritmi di addestramento si basano esclusivamente sulla continua elaborazione di grafi statici, un approccio che è stato criticato per i suoi elevati costi di calcolo. Questa tesi fornisce una soluzione a questo problema tramite un metodo efficiente per l’allenamento di reti neurali induttive basato su un’euristica per la selezione dei vertici. Viene inoltre descritto un metodo per eseguire predizioni in modo scalabile in tempo reale utilizzando Apache Flink, un sistema per l’elaborazione di grandi quantità di flussi di dati in tempo reale. / Grafer anses ofta vara ett utmärkt sätt att modellera komplexa problem i verkligheten eftersom de gör det möjligt att fånga relationer mellan objekt. På grund av deras allestädes närhet har grafinbäddningstekniker sysselsatt forskningsgrupper som undersöker hur hörn kan kodas in i ett lågdimensionellt latent utrymme, vilket är användbart för att sedan utföra maskininlärning. Nyligen har Graph Neural Networks (GNN) dominerat utrymmet för inbäddningsproduktion tack vare deras inneboende förmåga att koda latenta nodberoenden. Dessutom fick de nyinförda induktiva grafiska nervnäten stor popularitet för induktivt lärande och representerande nodbäddningar genom sammanlagda åtgärder i grannskapet. Även när en helt ny nod, osynlig under träning, visas i diagrammet, kan den fortfarande representeras ordentligt av dess angränsande noder. Även om detta tillvägagångssätt tycks vara lämpligt för dynamiska grafer, är tillgängliga system och träningsmetodologier agnostiska för dynamik och förlitar sig bara på att behandla fullständiga ögonblicksbilder i partier, en metod som har kritiserats för dess höga beräkningskostnader. Detta arbete ger en grundlig lösning på detta specifika problem via en effektiv prioriteringsbaserad metod för att välja repeterade prover som garanterar låg komplexitet och hög noggrannhet. Slutligen har en dataparallell inferensmetod utvärderats i skala med Apache Flink, en dataströmprocessor för realtidsprognoser för grafiska dataströmmar med hög volym.

Page generated in 0.0812 seconds