Return to search

Peeking Through the Leaves : Improving Default Estimation with Machine Learning : A transparent approach using tree-based models

In recent years the development and implementation of AI and machine learning models has increased dramatically. The availability of quality data paving the way for sophisticated AI models. Financial institutions uses many models in their daily operations. They are however, heavily regulated and need to follow the regulation that are set by central banks auditory standard and the financial supervisory authorities. One of these standards is the disclosure of expected credit losses in financial statements of banks, called IFRS 9. Banks must measure the expected credit shortfall in line with regulations set up by the EBA and FSA. In this master thesis, we are collaborating with a Swedish bank to evaluate different machine learning models to predict defaults of a unsecured credit portfolio. The default probability is a key variable in the expected credit loss equation. The goal is not only to develop a valid model to predict these defaults but to create and evaluate different models based on their performance and transparency. With regulatory challenges within AI the need to introduce transparency in models are part of the process. When banks use models there’s a requirement on transparency which refers to of how easily a model can be understood with its architecture, calculations, feature importance and logic’s behind the decision making process. We have compared the commonly used model logistic regression to three machine learning models, decision tree, random forest and XG boost. Where we want to show the performance and transparency differences of the machine learning models and the industry standard. We have introduced a transparency evaluation tool called transparency matrix to shed light on the different transparency requirements of machine learning models. The results show that all of the tree based machine learning models are a better choice of algorithm when estimating defaults compared to the traditional logistic regression. This is shown in the AUC score as well as the R2 metric. We also show that when models increase in complexity there is a performance-transparency trade off, the more complex our models gets the better it makes predictions. / Under de senaste ̊aren har utvecklingen och implementeringen av AI- och maskininl ̈arningsmodeller o ̈kat dramatiskt. Tillg ̊angen till kvalitetsdata banar va ̈gen fo ̈r sofistikerade AI-modeller. Finansiella institutioner anva ̈nder m ̊anga modeller i sin dagliga verksamhet. De a ̈r dock starkt reglerade och m ̊aste fo ̈lja de regler som faststa ̈lls av centralbankernas revisionsstandard och finansiella tillsynsmyndigheter. En av dessa standarder a ̈r offentligg ̈orandet av fo ̈rva ̈ntade kreditfo ̈rluster i bankernas finansiella rapporter, kallad IFRS 9. Banker m ̊aste ma ̈ta den fo ̈rva ̈ntade kreditfo ̈rlusten i linje med regler som faststa ̈lls av EBA och FSA. I denna uppsats samarbetar vi med en svensk bank fo ̈r att utva ̈rdera olika maskininl ̈arningsmodeller f ̈or att fo ̈rutsa ̈ga fallisemang i en blankokreditsportfo ̈lj. Sannolikheten fo ̈r fallismang ̈ar en viktig variabel i ekvationen fo ̈r fo ̈rva ̈ntade kreditfo ̈rluster. M ̊alet a ̈r inte bara att utveckla en bra modell fo ̈r att prediktera fallismang, utan ocks ̊a att skapa och utva ̈rdera olika modeller baserat p ̊a deras prestanda och transparens. Med de utmaningar som finns inom AI a ̈r behovet av att info ̈ra transparens i modeller en del av processen. Na ̈r banker anva ̈nder modeller finns det krav p ̊a transparens som ha ̈nvisar till hur enkelt en modell kan fo ̈rst ̊as med sin arkitektur, bera ̈kningar, variabel p ̊averkan och logik bakom beslutsprocessen. Vi har ja ̈mfo ̈rt den vanligt anva ̈nda modellen logistisk regression med tre maskininla ̈rningsmodeller: Decision trees, Random forest och XG Boost. Vi vill visa skillnaderna i prestanda och transparens mellan maskininl ̈arningsmodeller och branschstandarden. Vi har introducerat ett verktyg fo ̈r transparensutva ̈rdering som kallas transparensmatris fo ̈r att belysa de olika transparenskraven fo ̈r maskininla ̈rningsmodeller. Resultaten visar att alla tra ̈d-baserade maskininla ̈rningsmodeller a ̈r ett ba ̈ttre val av modell vid prediktion av fallisemang j ̈amfo ̈rt med den traditionella logistiska regressionen. Detta visas i AUC-score samt R2 va ̈rdet. Vi visar ocks ̊a att n ̈ar modeller blir mer komplexa uppst ̊ar en kompromiss mellan prestanda och transparens; ju mer komplexa v ̊ara modeller blir, desto ba ̈ttre blir deras prediktioner.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-210412
Date January 2023
CreatorsHadad, Elias, Wigton, Angus
PublisherUmeå universitet, Institutionen för matematik och matematisk statistik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0028 seconds