• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 116
  • 19
  • 15
  • 8
  • 8
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 204
  • 88
  • 70
  • 56
  • 48
  • 47
  • 39
  • 34
  • 34
  • 31
  • 28
  • 25
  • 22
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Ensembles na classificação relacional / Ensembles in relational classification

Nils Ever Murrugarra Llerena 08 September 2011 (has links)
Em diversos domínios, além das informações sobre os objetos ou entidades que os compõem, existem, também, informaçõoes a respeito das relações entre esses objetos. Alguns desses domínios são, por exemplo, as redes de co-autoria, e as páginas Web. Nesse sentido, é natural procurar por técnicas de classificação que levem em conta estas informações. Dentre essas técnicas estão as denominadas classificação baseada em grafos, que visam classificar os exemplos levando em conta as relações existentes entre eles. Este trabalho aborda o desenvolvimento de métodos para melhorar o desempenho de classificadores baseados em grafos utilizando estratégias de ensembles. Um classificador ensemble considera um conjunto de classificadores cujas predições individuais são combinadas de alguma forma. Este classificador normalmente apresenta um melhor desempenho do que seus classificadores individualmente. Assim, foram desenvolvidas três técnicas: a primeira para dados originalmente no formato proposicional e transformados para formato relacional baseado em grafo e a segunda e terceira para dados originalmente já no formato de grafo. A primeira técnica, inspirada no algoritmo de boosting, originou o algoritmo KNN Adaptativo Baseado em Grafos (A-KNN). A segunda ténica, inspirada no algoritmo de Bagging originou trê abordagens de Bagging Baseado em Grafos (BG). Finalmente, a terceira técnica, inspirada no algoritmo de Cross-Validated Committees, originou o Cross-Validated Committees Baseado em Grafos (CVCG). Os experimentos foram realizados em 38 conjuntos de dados, sendo 22 conjuntos proposicionais e 16 conjuntos no formato relacional. Na avaliação foi utilizado o esquema de 10-fold stratified cross-validation e para determinar diferenças estatísticas entre classificadores foi utilizado o método proposto por Demsar (2006). Em relação aos resultados, as três técnicas melhoraram ou mantiveram o desempenho dos classificadores bases. Concluindo, ensembles aplicados em classificadores baseados em grafos apresentam bons resultados no desempenho destes / In many fields, besides information about the objects or entities that compose them, there is also information about the relationships between objects. Some of these fields are, for example, co-authorship networks and Web pages. Therefore, it is natural to search for classification techniques that take into account this information. Among these techniques are the so-called graphbased classification, which seek to classify examples taking into account the relationships between them. This paper presents the development of methods to improve the performance of graph-based classifiers by using strategies of ensembles. An ensemble classifier considers a set of classifiers whose individual predictions are combined in some way. This combined classifier usually performs better than its individual classifiers. Three techniques have been developed: the first applied for originally propositional data transformed to relational format based on graphs and the second and the third applied for data originally in graph format. The first technique, inspired by the boosting algorithm originated the Adaptive Graph-Based K-Nearest Neighbor (A-KNN). The second technique, inspired by the bagging algorithm led to three approaches of Graph-Based Bagging (BG). Finally the third technique, inspired by the Cross- Validated Committees algorithm led to the Graph-Based Cross-Validated Committees (CVCG). The experiments were performed on 38 data sets, 22 datasets in propositional format and 16 in relational format. Evaluation was performed using the scheme of 10-fold stratified cross-validation and to determine statistical differences between the classifiers it was used the method proposed by Demsar (2006). Regarding the results, these three techniques improved or at least maintain the performance of the base classifiers. In conclusion, ensembles applied to graph-based classifiers have good results in the performance of them
52

Fog and fog deposition: A novel approach to estimate the occurrence of fog and the amount of fog deposition: a case study for Germany

Körner, Philipp 07 December 2021 (has links)
This thesis is written as a cumulative dissertation. It presents methods and results which contribute to an improved understanding of the spatio-temporal variability of fog and fog deposition. The questions to be answered are: When is there how much fog, and where and how much fog is deposited on the vegetation as fog precipitation? Freely available data sets serve as a database. The meteorological input data are obtained from the Climate Data Center (CDC) of the German Meteorological Service (DWD). Station data for temperature, relative humidity and wind speed in hourly resolution are used. In addition, visibility data are used for validation purposes. Furthermore, Global Forest Heights (GFH) data from the National Aeronautics and Space Administration (NASA) are used as vegetation height data. The data from NASA’s Shuttle Radar Topography Mission (SRTM) is used as a digital elevation model. The first publication deals with gap filling and data compression for further calculations. This is necessary since the station density for hourly data is relatively low, especially before the 2000s. In addition, there are more frequent gaps in hourly data than in, for instance, daily data, which can thus be filled. It is shown that gradient boosting (gb) enables high quality gap filling in a short computing time. The second publication deals with the determination of the fog, especially with the liquid water content (lwc). Here the focus is on the correction of measurement errors of the relative humidity as well as methods of spatial interpolation are dealt with. The resulting lwc data for Germany with a temporal resolution of one hour and a spatial resolution of one kilometre, are validated against measured lwc data as well as visibility data of the DWD. The last publication uses the data and methods of the two previous publications. The vegetation and wind speed data are also used to determine fog precipitation from the lwc data. This is validated using data from other publications and water balance calculations. In addition to the measured precipitation, the fog precipitation data are used as an input variable for the modelling. This is also one of the possible applications: To determine precipitation from fog, which is not recorded by standard measuring methods, and thus to make water balance modelling more realistic.:1 MOTIVATION 6 2 PROBLEM DEFINITION AND TARGET SETTING 6 3 STRUCTURE 7 4 MODEL LIMITS 9 5 PUBLICATIONS 9 6 OUTLOOK 29
53

Modelos de classificação de risco de crédito para financiamentos imobiliários: regressão logística, análise discriminante, árvores de decisão, bagging e boosting

Lopes, Neilson Soares 08 August 2011 (has links)
Made available in DSpace on 2016-03-15T19:25:35Z (GMT). No. of bitstreams: 1 Neilson Soares Lopes.pdf: 983372 bytes, checksum: 2233d489295cd76cb2d8dcbd78e1e5de (MD5) Previous issue date: 2011-08-08 / Fundo Mackenzie de Pesquisa / This study applied the techniques of traditional parametric discriminant analysis and logistic regression analysis of credit real estate financing transactions where borrowers may or may not have a payroll loan transaction. It was the hit rate compared these methods with the non-parametric techniques based on classification trees, and the methods of meta-learning bagging and boosting that combine classifiers for improved accuracy in the algorithms.In a context of high housing deficit, especially in Brazil, the financing of real estate can still be very encouraged. The impacts of sustainable growth in the mortgage not only bring economic benefits and social. The house is, for most individuals, the largest source of expenditure and the most valuable asset that will have during her lifetime.At the end of the study concluded that the computational techniques of decision trees are more effective for the prediction of payers (94.2% correct), followed by bagging (80.7%) and boosting (or arcing , 75.2%). For the prediction of bad debtors in mortgages, the techniques of logistic regression and discriminant analysis showed the worst results (74.6% and 70.7%, respectively). For the good payers, the decision tree also showed the best predictive power (75.8%), followed by discriminant analysis (75.3%) and boosting (72.9%). For the good paying mortgages, bagging and logistic regression showed the worst results (72.1% and 71.7%, respectively). Logistic regression shows that for a borrower with payroll loans, the chance to be a bad credit is 2.19 higher than if the borrower does not have such type of loan.The presence of credit between the payroll operations of mortgage borrowers also has relevance in the discriminant analysis. / Neste estudo foram aplicadas as técnicas paramétricas tradicionais de análise discriminante e regressão logística para análise de crédito de operações de financiamento imobiliário. Foi comparada a taxa de acertos destes métodos com as técnicas não-paramétricas baseadas em árvores de classificação, além dos métodos de meta-aprendizagem BAGGING e BOOSTING, que combinam classificadores para obter uma melhor precisão nos algoritmos.Em um contexto de alto déficit de moradias, em especial no caso brasileiro, o financiamento de imóveis ainda pode ser bastante fomentado. Os impactos de um crescimento sustentável no crédito imobiliário trazem benefícios não só econômicos como sociais. A moradia é, para grande parte dos indivíduos, a maior fonte de despesas e o ativo mais valioso que terão durante sua vida. Ao final do estudo, concluiu-se que as técnicas computacionais de árvores de decisão se mostram mais efetivas para a predição de maus pagadores (94,2% de acerto), seguida do BAGGING (80,7%) e do BOOSTING (ou ARCING, 75,2%). Para a predição de maus pagadores em financiamentos imobiliários, as técnicas de regressão logística e análise discriminante apresentaram os piores resultados (74,6% e 70,7%, respectivamente). Para os bons pagadores, a árvore de decisão também apresentou o melhor poder preditivo (75,8%), seguida da análise discriminante (75,3%) e do BOOSTING (72,9%). Para os bons pagadores de financiamentos imobiliários, BAGGING e regressão logística apresentaram os piores resultados (72,1% e 71,7%, respectivamente).A regressão logística mostra que, para um tomador com crédito consignado, a chance se ser um mau pagador é 2,19 maior do que se este tomador não tivesse tal modalidade de empréstimo. A presença de crédito consignado entre as operações dos tomadores de financiamento imobiliário também apresenta relevância na análise discriminante.
54

Breeding white storks in former East Prussia : comparing predicted relative occurrences across scales and time using a stochastic gradient boosting method (TreeNet), GIS and public data

Wickert, Claudia January 2007 (has links)
In dieser Arbeit wurden verschiedene GIS-basierte Habitatmodelle für den Weißstorch (Ciconia ciconia) im Gebiet der ehemaligen deutschen Provinz Ostpreußen (ca. Gebiet der russischen Exklave Kaliningrad und der polnischen Woiwodschaft Ermland-Masuren) erstellt. Zur Charakterisierung der Beziehung zwischen dem Weißstorch und der Beschaffenheit seiner Umwelt wurden verschiedene historische Datensätze über den Bestand des Weißstorches in den 1930er Jahren sowie ausgewählte Variablen zur Habitat-Beschreibung genutzt. Die Aufbereitung und Modellierung der verwendeten Datensätze erfolgte mit Hilfe eines geographischen Informationssystems (ArcGIS) und einer statistisch-mathematischen Methode aus den Bereichen „Machine Learning“ und „Data-Mining“ (TreeNet, Salford Systems Ltd.). Unter Verwendung der historischen Habitat-Parameter sowie der Daten zum Vorkommen des Weißstorches wurden quantitative Modelle auf zwei Maßstabs-Ebenen erstellt: (i) auf Punktskala unter Verwendung eines Rasters mit einer Zellgröße von 1 km und (ii) auf Verwaltungs-Kreisebene basierend auf der Gliederung der Provinz Ostpreußen in ihre Landkreise. Die Auswertung der erstellten Modelle zeigt, dass das Vorkommen von Storchennestern im ehemaligen Ostpreußen, unter Berücksichtigung der hier verwendeten Variablen, maßgeblich durch die Variablen ‚forest’, ‚settlement area’, ‚pasture land’ und ‚coastline’ bestimmt wird. Folglich lässt sich davon ausgehen, dass eine gute Nahrungsverfügbarkeit, wie der Weißstorch sie auf Wiesen und Weiden findet, sowie die Nähe zu menschlichen Siedlungen ausschlaggebend für die Nistplatzwahl des Weißstorches in Ostpreußen sind. Geschlossene Waldgebiete zeigen sich in den Modellen als Standorte für Horste des Weißstorches ungeeignet. Der starke Einfluss der Variable ‚coastline’ lässt sich höchstwahrscheinlich durch die starke naturräumliche Gliederung Ostpreußens parallel zur Küstenlinie erklären. In einem zweiten Schritt konnte unter Verwendung der in dieser Arbeit erstellten Modelle auf beiden Skalen Vorhersagen für den Zeitraum 1981-1993 getroffen werden. Dabei wurde auf dem Punktmaßstab eine Abnahme an potentiellem Bruthabitat vorhergesagt. Im Gegensatz dazu steigt die vorhergesagte Weißstorchdichte unter Verwendung des Modells auf Verwaltungs-Kreisebene. Der Unterschied zwischen beiden Vorhersagen beruht vermutlich auf der Verwendung unterschiedlicher Skalen und von zum Teil voneinander verschiedenen erklärenden Variablen. Weiterführende Untersuchungen sind notwendig, um diesen Sachverhalt zu klären. Des Weiteren konnten die Modellvorhersagen für den Zeitraum 1981-1993 mit den vorliegenden Bestandserfassungen aus dieser Zeit deskriptiv verglichen werden. Es zeigt sich hierbei, dass die hier vorhergesagten Bestandszahlen höher sind als die in den Zählungen ermittelten. Die hier erstellten Modelle beschreiben somit vielmehr die Kapazität des Habitats. Andere Faktoren, die die Größe der Weißstorch-Population bestimmen, wie z.B. Bruterfolg oder Mortalität sollten in zukünftige Untersuchungen mit einbezogen werden. Es wurde ein möglicher Ansatz aufgezeigt, wie man mit den hier vorgestellten Methoden und unter Verwendung historischer Daten wertvolle Habitatmodelle erstellen sowie die Auswirkung von Landnutzungsänderungen auf den Weißstorch beurteilen kann. Die hier erstellten Modelle sind als erste Grundlage zu sehen und lassen sich mit Hilfe weitere Daten hinsichtlich Habitatstruktur und mit exakteren räumlich expliziten Angaben zu Neststandorten des Weißstorches weiter verfeinern. In einem weiteren Schritt sollte außerdem ein Habitatmodell für die heutige Zeit erstellt werden. Dadurch wäre ein besserer Vergleich möglich hinsichtlich erdenklicher Auswirkungen von Änderungen der Landnutzung und relevanten Umweltbedingungen auf den Weißstorch im Gebiet des ehemaligen Ostpreußens sowie in seinem gesamten Verbreitungsgebiet. / Different habitat models were created for the White Stork (Ciconia ciconia) in the region of the former German province of East Prussia (equals app. the current Russian oblast Kaliningrad and the Polish voivodship Warmia-Masuria). Different historical data sets describing the occurrence of the White Stork in the 1930s, as well as selected variables for the description of landscape and habitat, were employed. The processing and modeling of the applied data sets was done with a geographical information system (ArcGIS) and a statistical modeling approach that comes from the disciplines of machine-learning and data mining (TreeNet by Salford Systems Ltd.). Applying historical habitat descriptors, as well as data on the occurrence of the White Stork, models on two different scales were created: (i) a point scale model applying a raster with a cell size of 1 km2 and (ii) an administrative district scale model based on the organization of the former province of East Prussia. The evaluation of the created models show that the occurrence of White Stork nesting grounds in the former East Prussia for most parts is defined by the variables ‘forest’, ‘settlement area’, ‘pasture land’ and ‘proximity to coastline’. From this set of variables it can be assumed that a good food supply and nesting opportunities are provided to the White Stork in pasture and meadows as well as in the proximity to human settlements. These could be seen as crucial factors for the choice of nesting White Stork in East Prussia. Dense forest areas appear to be unsuited as nesting grounds of White Storks. The high influence of the variable ‘coastline’ is most likely explained by the specific landscape composition of East Prussia parallel to the coastline and is to be seen as a proximal factor for explaining the distribution of breeding White Storks. In a second step, predictions for the period of 1981 to 1993 could be made applying both scales of the models created in this study. In doing so, a decline of potential nesting habitat was predicted on the point scale. In contrast, the predicted White Stork occurrence increases when applying the model of the administrative district scale. The difference between both predictions is to be seen in the application of different scales (density versus suitability as breeding ground) and partly dissimilar explanatory variables. More studies are needed to investigate this phenomenon. The model predictions for the period 1981 to 1993 could be compared to the available inventories of that period. It shows that the figures predicted here were higher than the figures established by the census. This means that the models created here show rather a capacity of the habitat (potential niche). Other factors affecting the population size e.g. breeding success or mortality have to be investigated further. A feasible approach on how to generate possible habitat models was shown employing the methods presented here and applying historical data as well as assessing the effects of changes in land use on the White Stork. The models present the first of their kind, and could be improved by means of further data regarding the structure of the habitat and more exact spatially explicit information on the location of the nesting sites of the White Stork. In a further step, a habitat model of the present times should be created. This would allow for a more precise comparison regarding the findings from the changes of land use and relevant conditions of the environment on the White Stork in the region of former East Prussia, e.g. in the light of coming landscape changes brought by the European Union (EU).
55

Avaliação do algoritmo Gradient Boosting em aplicações de previsão de carga elétrica a curto prazo

Mayrink, Victor Teixeira de Melo 31 August 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-07T14:25:21Z No. of bitstreams: 1 victorteixeirademelomayrink.pdf: 2587774 bytes, checksum: 1319cc37a15480796050b618b4d7e5f7 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-07T15:06:57Z (GMT) No. of bitstreams: 1 victorteixeirademelomayrink.pdf: 2587774 bytes, checksum: 1319cc37a15480796050b618b4d7e5f7 (MD5) / Made available in DSpace on 2017-03-07T15:06:57Z (GMT). No. of bitstreams: 1 victorteixeirademelomayrink.pdf: 2587774 bytes, checksum: 1319cc37a15480796050b618b4d7e5f7 (MD5) Previous issue date: 2016-08-31 / FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas Gerais / O armazenamento de energia elétrica em larga escala ainda não é viável devido a restrições técnicas e econômicas. Portanto, toda energia consumida deve ser produzida instantaneamente; não é possível armazenar o excesso de produção, ou tampouco cobrir eventuais faltas de oferta com estoques de segurança, mesmo que por um curto período de tempo. Consequentemente, um dos principais desafios do planejamento energético consiste em realizar previsões acuradas para as demandas futuras. Neste trabalho, apresentamos um modelo de previsão para o consumo de energia elétrica a curto prazo. A metodologia utilizada compreende a construção de um comitê de previsão, por meio da aplicação do algoritmo Gradient Boosting em combinação com modelos de árvores de decisão e a técnica de amortecimento exponencial. Esta estratégia compreende um método de aprendizado supervisionado que ajusta o modelo de previsão com base em dados históricos do consumo de energia, das temperaturas registradas e de variáveis de calendário. Os modelos propostos foram testados em duas bases de dados distintas e demonstraram um ótimo desempenho quando comparados com resultados publicados em outros trabalhos recentes. / The storage of electrical energy is still not feasible on a large scale due to technical and economic issues. Therefore, all energy to be consumed must be produced instantly; it is not possible to store the production leftover, or either to cover any supply shortages with safety stocks, even for a short period of time. Thus, one of the main challenges of energy planning consists in computing accurate forecasts for the future demand. In this paper, we present a model for short-term load forecasting. The methodology consists in composing a prediction comitee by applying the Gradient Boosting algorithm in combination with decision tree models and the exponential smoothing technique. This strategy comprises a supervised learning method that adjusts the forecasting model based on historical energy consumption data, the recorded temperatures and calendar variables. The proposed models were tested in two di erent datasets and showed a good performance when compared with results published in recent papers.
56

Design and Analysis of Techniques for Multiple-Instance Learning in the Presence of Balanced and Skewed Class Distributions

Wang, Xiaoguang January 2015 (has links)
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, the Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Existing knowledge discovery and data analyzing techniques have shown great success in many real-world applications such as applying Automatic Target Recognition (ATR) methods to detect targets of interest in imagery, drug activity prediction, computer vision recognition, and so on. Among these techniques, Multiple-Instance (MI) learning is different from standard classification since it uses a set of bags containing many instances as input. The instances in each bag are not labeled | instead the bags themselves are labeled. In this area many researchers have accomplished a lot of work and made a lot of progress. However, there still exist some areas which are not covered. In this thesis, we focus on two topics of MI learning: (1) Investigating the relationship between MI learning and other multiple pattern learning methods, which include multi-view learning, data fusion method and multi-kernel SVM. (2) Dealing with the class imbalance problem of MI learning. In the first topic, three different learning frameworks will be presented for general MI learning. The first uses multiple view approaches to deal with MI problem, the second is a data fusion framework, and the third framework, which is an extension of the first framework, uses multiple-kernel SVM. Experimental results show that the approaches presented work well on solving MI problem. The second topic is concerned with the imbalanced MI problem. Here we investigate the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. For this problem, we propose three solution frameworks: a data re-sampling framework, a cost-sensitive boosting framework and an adaptive instance-weighted boosting SVM (with the name IB_SVM) for MI learning. Experimental results - on both benchmark datasets and application datasets - show that the proposed frameworks are proved to be effective solutions for the imbalanced problem of MI learning.
57

Evaluating UVB and UVA Boosting Technologies for Chemical and Physical Sunscreens

Huynh, An Ngoc Hiep January 2020 (has links)
No description available.
58

Strojové učení v algoritmickém obchodování / Machine Learning in Algorithmic Trading

Bureš, Michal January 2021 (has links)
This thesis is dedicated to the application of machine learning methods to algorithmic trading. We take inspiration from intraday traders and implement a system that predicts future price based on candlestick patterns and technical indicators. Using forex and US stocks tick data we create multiple aggregated bar representations. From these bars we construct original features based on candlestick pattern clustering by K-Means and long-term features derived from standard technical indicators. We then setup regression and classification tasks for Extreme Gradient Boosting models. From their predictions we extract buy and sell trading signals. We perform experiments with eight different configurations over multiple assets and trading strategies using walk-forward validation. The results report Sharpe ratios and mean profits of all the combinations. We discuss the results and recommend suitable configurations. In overall our strategies outperform randomly selected strategies. Furthermore, we provide and discuss multiple opportunities for further research.
59

Sentimentanalys av svenskt aktieforum för att förutspå aktierörelse / Sentiment analysis of Swedish stock trading forum for predicting stock market movement

Ouadria, Michel Sebastian, Ciobanu, Ann-Stephanie January 2020 (has links)
Förevarande studie undersöker möjligheten att förutsäga aktierörelse på en dagligbasis med sentimentanalys av inlägg från ett svenskt aktieforum. Sentimentanalys används för att finna subjektivitet i form av känslor (sentiment) ur text. Textdata extraherades från ett svenskt aktieforum för att förutsäga aktierörelsen för den relaterade aktien. All data aggregerades inom en bestämd tidsperiod på två år. Undersökningen utnyttjade maskininlärning för att träna tre maskininlärningsmodeller med textdata och aktiedata. Resultatet påvisade ingen tydlig korrelation mellan sentiment och aktierörelse. Vidare uppnåddes inte samma resultat som tidigare arbeten inom området. Den högst uppnådda noggrannheten med modellerna beräknades till 64%. / The present study examines the possibility of predicting stock movement on a daily basis with sentiment analysis of posts in a swedish stock trading forum. Sentiment analysis is used to find subjectivity in the form of emotions (sentiment) from text. Textdata was extracted from a stock forum to predict the share movement of the related share. All data was aggregated within a fixed period of two years. The analysis utilizes machine learning to train three machine learning models with textdata and stockdata. The result showed no clear correlation between sentiment and stock movement. Furthermore, the result was not able to replicate accuracy as previous work in the field. The highest accuracy achieved with the models was calculated at 64%.
60

Modelling default probabilities: The classical vs. machine learning approach / Modellering av fallissemang: Klassisk metod vs. maskininlärning

Jovanovic, Filip, Singh, Paul January 2020 (has links)
Fintech companies that offer Buy Now, Pay Later products are heavily dependent on accurate default probability models. This is since the fintech companies bear the risk of customers not fulfilling their obligations. In order to minimize the losses incurred to customers defaulting several machine learning algorithms can be applied but in an era in which machine learning is gaining popularity, there is a vast amount of algorithms to select from. This thesis aims to address this issue by applying three fundamentally different machine learning algorithms in order to find the best algorithm according to a selection of chosen metrics such as ROCAUC and precision-recall AUC. The algorithms that were compared are Logistic Regression, Random Forest and CatBoost. All these algorithms were benchmarked against Klarna's current XGBoost model. The results indicated that the CatBoost model is the optimal one according to the main metric of comparison, the ROCAUC-score. The CatBoost model outperformed the Logistic Regression model by seven percentage points, the Random Forest model by three percentage points and the XGBoost model by one percentage point. / Fintechbolag som erbjuder Köp Nu, Betala Senare-tjänster är starkt beroende av välfungerande fallissemangmodeller. Detta då dessa fintechbolag bär risken av att kunder inte betalar tillbaka sina krediter. För att minimera förlusterna som uppkommer när en kund inte betalar tillbaka finns flera olika maskininlärningsalgoritmer att applicera, men i dagens explosiva utveckling på maskininlärningsfronten finns det ett stort antal algoritmer att välja mellan. Denna avhandling ämnar att testa tre olika maskininlärningsalgoritmer för att fastställa vilken av dessa som presterar bäst sett till olika prestationsmått så som ROCAUC och precision-recall AUC. Algoritmerna som jämförs är Logistisk Regression, Random Forest och CatBoost. Samtliga algoritmers prestanda jämförs även med Klarnas nuvarande XGBoost-modell. Resultaten visar på att CatBoost-modellen är den mest optimala sett till det primära prestationsmåttet ROCAUC. CatBoost-modellen var överlägset bättre med sju procentenheter högre ROCAUC än Logistisk Regression, tre procentenheter högre ROCAUC än Random Forest och en procentenhet högre ROCAUC än Klarnas nuvarande XGBoost-modell

Page generated in 0.0265 seconds