Global ETD Search

Return to search

The Viability of Cluster Based Representations for Classification of Over the Counter Derivative Populations / Lämpligheten hos klustringsbaserade representationer av derivatkontraktspopulationer för klassificiering

A population of financial derivatives can be compressed if a subset of derivatives yield a net cash flow that lies within a given tolerance level between the parties involved. To conduct a correct population compression, it is essential that all derivatives of the involved parties are present in the derivative set. The current state-of-the-art to ensure this is to have analysts with domain expertise analyzing the populations with the use of assisting tools. The purpose of this project was to automate this process through the use of machine learning classification. Different ways of using clustering for representing a collection of derivatives was implemented and evaluated. The first representation derives from a clustering of all derivatives across populations, describing the distribution of the derivatives across the clusters. A second representation uses the previously mentioned clustering to instead find the distance from a population to all the clusters to form a vector. These representations were compared to two naive representations, one where the mean derivative of a population is used as representation and one where a random clustering is used to find a distribution. The representations were evaluated through classification, using three different classification models (Support Vector Machine, Decision Tree, and a Naive Bayes' Classifier). Different models were tested to examine whether the representations generalize across models. Both the proposed representations were found to be comparable with the naive representations, indicating that the representations fail to capture the characteristics of missing derivatives. The cause of this was found to be that populations of derivatives vary too much for clustering to be consistent enough across populations. / En population av finansiella derivat kan komprimeras om en delmängd av derivat ger ett nettokassaflöde mellan de berörda parterna som ligger inom ett givet toleransintervall. För att göra en korrekt kompression är det viktigt att alla derivat med de involverade parterna finns närvarande i derivatuppsättningen. I nuläget används analytiker som med domänkompetens och erfarenhet kan analysera populationen med hjälp utav verktyg. Syftet med detta projekt var att undersöka om det är möjligt att automatisera denna process genom att använda maskininlärningsklassificering. Olika sätt att använda klustring för att representera en samling derivat implementerades och utvärderades. Den första representationen klustrar alla derivat över populationer och representerar en population med en vektor som beskriver fördelningen av derivaten över kluster. En andra representation använder den tidigare nämnda klustringen för att istället hitta avståndet från populationen som ska representeras till alla kluster för att bilda en vektor. Dessa representationer jämfördes med två naiva representationer, en där det genomsnittliga derivatet av en population används som representation och en där en slumpmässig klustring används för att hitta en distribution likt den först beskrivna representationen. Representationerna utvärderades genom klassificering med tre olika klassificeringsmodeller (stödvektormaskiner, beslutsträd och en naiv Bayesklassificierare). Olika modeller testades för att utvärdera hur representationerna generaliserar över modeller. Båda de föreslagna representationerna visade sig prestera i linje med de naiva representationerna, vilket indikerar att representationerna misslyckas med att fånga kännetecknen för saknade derivat. Orsaken till detta tycks vara att varje uppsättning av derivat är så unik att klustring av derivaten blir för olik baserat på vilken uppsättning man använder.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210418

representation learning

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-210418
Date	January 2017
Creators	Nordberg, Marcus
Publisher	KTH, Skolan för datavetenskap och kommunikation (CSC)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0795 seconds

The Viability of Cluster Based Representations for Classification of Over the Counter Derivative Populations / Lämpligheten hos klustringsbaserade representationer av derivatkontraktspopulationer för klassificiering

Description

Links & Downloads

Tags

Additional Fields