This work presents an investigation of tailoring Network Representation Learning (NRL) for an application in the Financial Industry. NRL approaches are data-driven models that learn how to encode graph structures into low-dimensional vector spaces, which can be further exploited by downstream Machine Learning applications. They can potentially bring a lot of benefits in the Financial Industry since they extract in an automatic way features that can provide useful input regarding graph structures, called embeddings. Financial transactions can be represented as a network, and through NRL, it is possible to extract embeddings that reflect the intrinsic inter-connected nature of economic relationships. Such embeddings can be used for several purposes, among which Anomaly Detection to fight financial crime.This work provides a qualitative analysis over state-of-the-art NRL models, which identifies Graph Convolutional Network (ConvGNN) as the most suitable category of approaches for Financial Industry but with a certain need for further improvement. Financial Industry poses additional challenges when modelling a NRL solution. Despite the need of having a scalable solution to handle real-world graph with considerable dimensions, it is necessary to take into consideration several characteristics: transactions graphs are inherently dynamic since every day new transactions are executed and nodes can be heterogeneous. Besides, everything is further complicated by the need to have updated information in (near) real-time due to the sensitivity of the application domain. For these reasons, GraphSAGE has been considered as a base for the experiments, which is an inductive ConvGNN model. Two variants of GraphSAGE are presented: a dynamic variant whose weights evolve accordingly with the input sequence of graph snapshots, and a variant specifically meant to handle bipartite graphs. These variants have been evaluated by applying them to real-world data and leveraging the generated embeddings to perform Anomaly Detection. The experiments demonstrate that leveraging these variants leads toimagecomparable results with other state-of-the-art approaches, but having the advantage of being suitable to handle real-world financial data sets. / Detta arbete presenterar en undersökning av tillämpningar av Network Representation Learning (NRL) inom den finansiella industrin. Metoder inom NRL möjliggör datadriven kondensering av grafstrukturer till lågdimensionella och lätthanterliga vektorer.Dessa vektorer kan sedan användas i andra maskininlärningsuppgifter. Närmare bestämt, kan metoder inom NRL underlätta hantering av och informantionsutvinning ur beräkningsintensiva och storskaliga grafer inom den finansiella sektorn, till exempel avvikelsehantering bland finansiella transaktioner. Arbetet med data av denna typ försvåras av det faktum att transaktionsgrafer är dynamiska och i konstant förändring. Utöver detta kan noderna, dvs transaktionspunkterna, vara vitt skilda eller med andra ord härstamma från olika fördelningar.I detta arbete har Graph Convolutional Network (ConvGNN) ansetts till den mest lämpliga lösningen för nämnda tillämpningar riktade mot upptäckt av avvikelser i transaktioner. GraphSAGE har använts som utgångspunkt för experimenten i två olika varianter: en dynamisk version där vikterna uppdateras allteftersom nya transaktionssekvenser matas in, och en variant avsedd särskilt för bipartita (tvådelade) grafer. Dessa varianter har utvärderats genom användning av faktiska datamängder med avvikelsehantering som slutmål.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-281832 |
Date | January 2020 |
Creators | Martignano, Anna |
Publisher | KTH, Skolan för elektroteknik och datavetenskap (EECS) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-EECS-EX ; 2020:643 |
Page generated in 0.0025 seconds