Spelling suggestions: "subject:"9gradient boosting"" "subject:"cogradient boosting""
21 |
A Predictive Analysis of Customer Churn / : En Prediktiv Analys av KundbortfallEskils, Olivia, Backman, Anna January 2023 (has links)
Churn refers to the discontinuation of a contract; consequently, customer churn occurs when existing customers stop being customers. Predicting customer churn is a challenging task in customer retention, but with the advancements made in the field of artificial intelligence and machine learning, the feasibility to predict customer churn has increased. Prior studies have demonstrated that machine learning can be utilized to forecast customer churn. The aim of this thesis was to develop and implement a machine learning model to predict customer churn and identify the customer features that have a significant impact on churn. This Study has been conducted in cooperation with the Swedish insurance company Bliwa, who expressed interest in gaining an increased understanding of why customers choose to leave. Three models, Logistic Regression, Random Forest, and Gradient Boosting, were used and evaluated. Bayesian optimization was used to optimize the models. After obtaining an indication of their predictive performance during evaluation using Cross-Validation, it was concluded that LightGBM provided the best result in terms of PR-AUC, making it the most effective approach for the problem at hand. Subsequently, a SHAP-analysis was carried out to gain insights into which customer features that have an impact on whether or not a customer churn. The outcome of the SHAP-analysis revealed specific customer features that had a significant influence on churn. This knowledge can be utilized to proactively implement measures aimed at reducing the probability of churn. / Att förutsäga kundbortfall är en utmanande uppgift inom kundbehållning, men med de framsteg som gjorts inom artificiell intelligens och maskininlärning har möjligheten att förutsäga kundbortfall ökat. Tidigare studier har visat att maskinlärning kan användas för att prognostisera kundbortfall. Syftet med denna studie var att utveckla och implementera en maskininlärningsmodell för att förutsäga kundbortfall och identifiera kundegenskaper som har en betydande inverkan på varför en kund väljer att lämna eller inte. Denna studie har genomförts i samarbete med det svenska försäkringsbolaget Bliwa, som uttryckte sitt intresse över att få en ökad förståelse för varför kunder väljer att lämna. Tre modeller, Logistisk Regression, Random Forest och Gradient Boosting användes och utvärderades. Bayesiansk optimering användes för att optimera dessa modeller. Efter att ha utvärderat prediktiv noggrannhet i samband med krossvalidering drogs slutsatsen att LightGBM gav det bästa resultatet i termer av PR-AUC och ansågs därför vara den mest effektiva metoden för det aktuella problemet. Därefter genomfördes en SHAP-analys för att ge insikter om vilka kundegenskaper som påverkar varför en kund riskerar, eller inte riskerar att lämna. Resultatet av SHAP-analysen visade att vissa kundegenskaper stack ut och verkade ha en betydande påverkan på kundbortfall. Denna kunskap kan användas för att vidta proaktiva åtgärder för att minska sannolikheten för kundbortfall.
|
22 |
Extracting Rules from Trained Machine Learning Models with Applications in Bioinformatics / 機械学習モデルからの知識抽出と生命情報学への応用Liu, Pengyu 24 May 2021 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23397号 / 情博第766号 / 新制||情||131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 阿久津 達也, 教授 山本 章博, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
23 |
Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based MethodsAl-Mter, Yusur January 2020 (has links)
Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs.
|
24 |
Predicting Risk of Delays in Postal Deliveries with Neural Networks and Gradient Boosting Machines / Predicering av risk för förseningar av leveranser med neurala nätverk och gradient boosting machinesSöderholm, Matilda January 2020 (has links)
This thesis conducts a study on a data set from the Swedish and Danish postal service Postnord, comparing an artificial neural network (ANN) and a gradient boosting machine (GBM) for predicting delays in package deliveries. The models are evaluated based on F1-score for the important class which represents the data points that are delayed and needed to be identified. The GBM is already implemented and tuned using grid search by Postnord, the ANN is tuned using sequential model based optimization with the tree Parzen estimator function. Furthermore, it is trained using dynamic resampling to handle the imbalanced data set. Even with several measures implemented to handle the class imbalance, the ANN performs poorly when tested on unseen data, unlike the GBM. The GBM has high precision (84%) and decent recall (24%), which produces a F1-score of 0.38. The ANN has high recall (62%) but extremely low precision (5%) which gives a F1-score of 0.08, indicating that it is biased to predict sample as delayed when it is in time. The GBM has a natural handling of class imbalance unlike the ANN, and even with measures taken to improve the ANN and its handling of class imbalance, GBM performs better.
|
25 |
Investigating the Impact of Air Pollution, Meteorology, and Human Mobility on Excess Deaths during COVID-19 in Quito : A Correlation, Regression, Machine Learning, and Granger Causality AnalysisTariq, Waleed, Naqvi, Sehrish January 2023 (has links)
Air pollution and meteorological conditions impact COVID-19 mortality rates. This research studied Quito, Ecuador, using Granger causality tests and regression models to investigate the relationship between pollutants, meteorological variables, human mobility, and excess deaths. Results suggested that Mobility as defined by Google Mobility Index, Facebook Isolation Index, in addition to Nitrogen Dioxide, and Sulphur Dioxide significantly impact excess deaths, while Carbon Monoxide and Relative Humidity have mixed results. Measures to reduce Carbon Monoxide emissions and increase humidity levels may mitigate the impact of air pollution on COVID-19 mortality rates. Further research is needed to investigate the impact of pollutants on COVID-19 transmission in other locations. Healthcare decision-makers must monitor and mitigate the impact of pollutants, promote healthy air quality policies, and encourage physical activity in safe environments. They must also consider meteorological conditions and implement measures such as increased ventilation and air conditioning to reduce exposure. Additionally, they must consider human mobility and reduce it to slow the spread of the diseases. Decisionmakers must monitor and track excess deaths during the pandemic to understand the impact of pollutants, meteorological conditions, and human mobility on human health. Public education is critical to raising awareness of air quality and its impact on health. Encouraging individuals to reduce their exposure to pollutants and meteorological conditions can play a critical role in mitigating the impact of air pollution on respiratory health during the pandemic.
|
26 |
A Framework for Defining, Measuring, and Predicting Service Procurement SavingsBerggren, Oliver, Matti, Zina January 2021 (has links)
Recent technical advances have paved the way for transformations such as Industry 4.0, Supply Chain 4.0, and new ways for organizations to utilize services to meet the needs of people. In the midst of this shift, a focus has been put on service procurement to meet the demand of everything from cloud computing and information technology to software solutions that support operations or add value to the end customer. Procurement is an integral part of organizations and typically accounts for a substantial part of their costs. Analyzing savings is one of the primary ways of measuring cost reduction and performance. This paper examines how savings can be defined and measured in a unifying way, and determine if machine learning can be used to predict service purchase costs. Semi-structured interviews were utilized to find definitions and measurements. Three decision-tree ensemble machine learning models, XGBoost, LightGBM, and CatBoost were evaluated to study cost prediction. The result indicates that cost reduction and cost avoidance should be seen as a financial, and a performance measure, respectively. Spend and capital binding can be controlled by a budget reallocation system and could be improved further with machine learning cost prediction. The best performing model was XGBoost with a MAPE of 14.17%, compared to the base model’s MAPE of 40.24%. This suggests that budget setting and negotiation can be aided by more accurately predicting cost through machine learning, and in turn have a positive impact on an organization’s resource allocation and profitability. / Nya teknologiska framsteg har gett upphov till transformationer som Industri 4.0, Supply Chain 4.0 och nya satt för organisationer att använda tjänster för att möta människors behov. Från denna förändring har fokus hamna på tjänsteupphandling för att möta efterfrågan på allt från molntjänster och informationsteknologi till mjukvarulösningar som stödjer operationer eller skapar värde för slutkunder. Upphandling ar en väsentlig del av organisationer och utgör oftast en stor del av deras kostnader. Att mata besparingar är ett av de primära sätten att driva kostnadsreducering och prestanda. Detta arbete utforskar hur besparingar kan definieras och matas på ett förenande sätt och undersöker om maskininlärning kan användas för att predicera tjänsteinköpskostnader. Semistrukturerade intervjuer hölls för att hitta definitioner och mått. Tre maskininlärningsmodeller, XGBoost, LightGMB och CatBoost utvärderades för att studera kostnadsprediktion. XGBoost presterade bäst med MAPE 14,17%, jämfört med basmodellens MAPE på 40,24%. Detta tyder på att budgetsättning och förhandling kan stödjas av maskininlärning genom att mer precist predicera kostnader, som i sin tur kan ha en positiv påverkan på en organisations resursallokering och lönsamhet.
|
27 |
Técnicas de machine learning aplicadas na recuperação de crédito do mercado brasileiroForti, Melissa 08 August 2018 (has links)
Submitted by Melissa Forti (melissaforti@gmail.com) on 2018-09-03T12:07:02Z
No. of bitstreams: 1
Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Approved for entry into archive by Joana Martorini (joana.martorini@fgv.br) on 2018-09-03T17:14:01Z (GMT) No. of bitstreams: 1
Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Approved for entry into archive by Suzane Guimarães (suzane.guimaraes@fgv.br) on 2018-09-04T13:30:27Z (GMT) No. of bitstreams: 1
Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5) / Made available in DSpace on 2018-09-04T13:30:28Z (GMT). No. of bitstreams: 1
Melissa_Forti_dissertacao.pdf: 2661806 bytes, checksum: a588904f04c4b3d523f82e716231ffd6 (MD5)
Previous issue date: 2018-08-08 / A necessidade de conhecer o cliente sempre foi um diferencial para o mercado e nestes últimos anos vivenciamos um crescimento exponencial de informações e técnicas que promovem a avaliação para todas as fases do ciclo de crédito, desde a prospecção até a recuperação de dívidas. Nesse contexto, as empresas estão investindo cada vez mais em métodos de Machine Learning para que possam extrair o máximo de informações e assim terem processos mais assertivos e rentáveis. No entanto, essas técnicas possuem ainda alguma desconfiança no ambiente financeiro. Diante desse contexto, o objetivo desse trabalho foi aplicar as técnicas de Machine Learning: Random Forest, Support Vector Machine e Gradient Boosting para um banco de dados real de cobrança, a fim de identificar os clientes mais propensos a quitar suas dívidas (Collection Score) e comparar a acurácia e interpretação desses modelos com a metodologia tradicional de Regressão Logística. A principal contribuição desse trabalho está relacionada com a comparação das técnicas em um cenário de recuperação de crédito considerando as principais características, vantagens e desvantagens. / The need to know the customer has always been a differential for the market, and in currently years we have experienced an exponential growth of information and techniques that promote this evaluation for all phases of the credit cycle, from prospecting to debt recovery. In this context, companies are increasingly investing in Machine Learning methods, so that they can extract the maximum information and thus have more assertive and profitable processes. However, these models still have a lot of distrust in the financial environment. Given this need and uncertainty, the objective of this work was to apply the Machine Learning techniques: Random Forest, Support Vector Machine and Gradient Boosting to a real collection database in order to identify the recover clients (Collection Score) and to compare the accuracy and interpretation of these models with the classical logistic regression methodology. The main contribution of this work is related to the comparison of the techniques and if they are suitable for this application, considering its main characteristics, pros and cons.
|
28 |
Strategies for Combining Tree-Based Ensemble ModelsZhang, Yi 01 January 2017 (has links)
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.
|
29 |
Forecasting anomalies in time series data from online production environmentsSseguya, Raymond January 2020 (has links)
Anomaly detection on time series forecasts can be used by many industries in especially forewarning systems that can predict anomalies before they happen. Infor (Sweden) AB is software company that provides Enterprise Resource Planning cloud solutions. Infor is interested in predicting anomalies in their data and that is the motivation for this thesis work. The general idea is firstly to forecast the time series and then secondly detect and classify anomalies on the forecast. The first part is time series forecasting and the second part is anomaly detection and classification done on the forecasted values. In this thesis work, the time series forecasting to predict anomalous behaviour is done using two strategies namely the recursive strategy and the direct strategy. The recursive strategy includes two methods; AutoRegressive Integrated Moving Average and Neural Network AutoRegression. The direct strategy is done with ForecastML-eXtreme Gradient Boosting. Then the three methods are compared concerning performance of forecasting. The anomaly detection and classification is done by setting a decision rule based on a threshold. In this thesis work, since the true anomaly thresholds were not previously known, an arbitrary initial anomaly threshold is set by using a combination of statistical methods for outlier detection and then human judgement by the company commissioners. These statistical methods include Seasonal and Trend decomposition using Loess + InterQuartile Range, Twitter + InterQuartile Range and Twitter + GESD (Generalized Extreme Studentized Deviate). After defining what an anomaly threshold is in the usage context of Infor (Sweden) AB, then a decision rule is set and used to classify anomalies in time series forecasts. The results from comparing the classifications of the forecasts from the three time series forecasting methods are unfortunate and no recommendation is made concerning what model or algorithm to be used by Infor (Sweden) AB. However, the thesis work concludes by recommending other methods that can be tried in future research.
|
30 |
Prediction of Credit Risk using Machine Learning ModelsIsaac, Philip January 2022 (has links)
This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use.
|
Page generated in 0.0747 seconds