Global ETD Search

1	IDENTIFIKATION AV RISKINDIKATORER I FINANSIELL INFORMATION MED HJÄLP AV AI/ML : Ökade möjligheter för myndigheter att förebygga ekonomisk brottslighet / INDENTIFICATION OF INDICATORS FOR RISK IN FINANCIAL INFORMATION BY USING AI/ML : Improved possibilities for authorities to prevent economic crimes Ahlm, Kristoffer January 2021 (has links) Ekonomisk brottslighet är mer lukrativt jämfört med annan brottslighet som narkotika, häleri och människohandel. Tidiga åtgärder som försvårar att kriminella kan använda företag för brottsliga syften gör att stora kostnader för samhället kan undvikas. En genomgång av litteraturen visade också att det finns stora brister i samarbetet mellan svenska myndigheter för att upptäcka grov ekonomisk brottslighet. Idag uppdagas brotten först ofta efter att en konkurs inletts. I studier har maskininlärningsmodeller prövats för att kunna upptäcka ekonomisk brottslighet och några svenska myndigheter använder maskininlärningsmodeller för att upptäcka brott men mer avancerade metoder används idag av danska myndigheter. Bolagsverket har idag ett omfattande register för bolag i Sverige och denna studie syftar till att undersöka om maskininlärning kan användas för att identifiera misstänkta bolag, genom att använda digitalt inlämnade årsredovisningar och information ur bolagsverkets register för att kunna träna klassificeringsmodeller att identifiera misstänkta bolag. För att träna modellen så har stämningsansökningar inhämtats från Ekobrottsmyndigheten som kunnat kopplas till specifika bolag av de inlämnade årsredovisningar. Principalkomponentanalys används för att visuellt visa på skillnader mellan grupperna misstänkta och icke misstänkta bolag och analyserna visade på ett överlapp mellan grupperna och ingen tydlig klustring av grupperna. Data var obalanserat med 38 misstänkta bolag av totalt 1009 bolag och därför användes översamplingstekniken SMOTE för att skapa mer syntetiskt data och för att öka antalet i gruppen misstänkta. Två maskininlärningsmodeller Random Forest och Stödvektormaskin (SVM) jämfördes i en 10 fold korsvalidering. Där båda uppvisade en recall på runt 0.91 men där Random Forest hade en mycket högre precision och med högre accuracy. Random Forest valdes och tränades på nytt och uppvisades en recall på 0.75 när den testades på osett data bestående av 8 misstänkta av 202 bolag. Ett sänkt tröskelvärde resulterade i en högre recall men med en större antal felklassificerade bolag. Studien visar tydligt problemet med obalans i data och de utmaningar man ställs inför med mindre data. Ett större data hade möjligjort ett strängare urval på brottstyper som hade kunnat ge en mer robust modell som skulle kunna användas av bolagsverket för att lättare kunna identifiera misstänkta bolag i deras register. / Economic crimes are more lucrative compared to other crimes as drugs, selling of stolen gods, trafficing. Early preventions that make it more difficult for criminals to use companies for criminal purposes can reduce large costs for sociaty. A litterature study showed that there are large weaknesses in the collaboration between Swedish authorities to detect serious economic crimes.Today most crimes among companies that commit fraud are found after a company has declared bancruptcy. In studies, machine learning models have been tested to detect economic crimes and some swedish authorites are now using machine learning methods to detect different crimes and more advanced methods are used by the danish authorites. Bolagsverket has a large register of companies in Sweden and the aim of this study is to investigate if machinelearning can be used to detect on annual reports that have been digitaly submited and information in Bolagsverket’s register to be able to train classificationsmodels and identify companies that are suspicious. To be able to train the model lawsuits have been collected from the Swedish Economic Crime Authority that can be connected to specific companies through their digitally submited annual report. Principal component analysis is used to visually show differences between the groups suspect companies and not suspected companies and the analysis show that there is an overlap between the groups and no clear clustering between the groups. Because the dataset was unbalanced with 38 suspicious companies out of 1009 companies the oversampling tecnique SMOTE was used to create more synthethic data and more suspects in the dataset. The two machinelearnings models Random Forest and support vector machine (SVM) was compared in a 10 fold crossvalidation. Both models showed a recall on around 0.91 but Random Forest had a much higher precision with a higher accuracy. Random Forest was chosen and was trained again and showed a recall on 0.75 when it was tested on unseen data with 8 suspects out of 202 companies. Lowering the treshold resulted in a higher recall but with a larger portion of wrongly classfied companies. The study shows clearly the problem with an unbalanced dataset and the challanges with a small dataset. A larger dataset could have made it possible to make a more selective selection of certain crimes that could have resulted in a more robust model that could be used by Bolagsverket to easier identify suspicous companies in their register. financial crime ecocrime ecocrime machine learning fraud risk riskwork authorites artificial intelligence ai ml financial information annual reports prevention criminality detect fraudster fradulent detect economic crimes ekobrott ekobrottslighet maskininlärning bedrägeri risk riskarbete myndigheter artificiell intelligens ai ml finansiell information årsredovisningar förebygga brottslighet riskindikatorer upptäcka upptäcka ekonomisk brottslighet förebygga ekonomisk brottslighet Mathematics Matematik

Search results