Global ETD Search

1	Multi-Class Classification for Predicting Customer Satisfaction : Application of machine learning methods to predict customer satisfaction at IKEA Backerholm, Stina, Börjesjö, Malin January 2023 (has links) Gaining a comprehensive understanding of the features that contribute to customer satisfaction after contact with IKEA’s Remote Customer Meeting Points (RCMPs) is essential for implementing effective remedial measures in the future. The aim of this project is to investigate if it is possible to find key features that influence customer satisfaction and to use these to predict customer satisfaction. The task has been approached as a multi-class classification problem, with the objective of classifying the observations into five distinct levels of customer satisfaction. The study utilized three models, Multinomial Logistic Regression, Random Forest, and Extreme Gradient Boosting, to investigate these possibilities. Based on the methods used and the available data, the results indicate that it is currently not feasible to accurately identify key features or predict customer satisfaction. / Att förstå vilka faktorer som bidrar till kundnöjdhet efter en kontakt med IKEAs RCMPs är avgörande för att kunna genomföra effektiva åtgärder i framtiden. Syftet med detta projekt är att undersöka om det är möjligt att hitta nyckelfaktorer som påverkar kundnöjdhet och använda dessa för att prediktera kundnöjdhet. Uppgiften har angripits som ett multi-klass klassificeringsproblem, med syftet att klas- sificera observationerna i fem olika nivåer av kundnöjdhet. Studien har utvärderat tre olika modeller, Multinomial Logistic Regression, Random Forest och Extreme Gradient Boosting, för att undersöka dessa möjligheter. Baserat på de använda metoderna med tillgängliga data, indikerar resultaten att det för tillfället inte är möjligt att identifiera nyckelfaktorer eller prediktera kundnöjdhet med hög noggrannhet. Multi-Class Classification Imbalanced Data Machine Learning Multi-Klass Klassifisering Obalanserat Data Maskininlärning Mathematics Matematik
2	Comparison of Machine Learning Techniques when Estimating Probability of Impairment : Estimating Probability of Impairment through Identification of Defaulting Customers one year Ahead of Time / En jämförelse av maskininlärningstekniker för uppskattning av Probability of Impairment : Uppskattningen av Probability of Impairment sker genom identifikation av låntagare som inte kommer fullfölja sina återbetalningsskyldigheter inom ett år Eriksson, Alexander, Långström, Jacob January 2019 (has links) Probability of Impairment, or Probability of Default, is the ratio of how many customers within a segment are expected to not fulfil their debt obligations and instead go into Default. This is a key metric within banking to estimate the level of credit risk, where the current standard is to estimate Probability of Impairment using Linear Regression. In this paper we show how this metric instead can be estimated through a classification approach with machine learning. By using models trained to find which specific customers will go into Default within the upcoming year, based on Neural Networks and Gradient Boosting, the Probability of Impairment is shown to be more accurately estimated than when using Linear Regression. Additionally, these models provide numerous real-life implementations internally within the banking sector. The new features of importance we found can be used to strengthen the models currently in use, and the ability to identify customers about to go into Default let banks take necessary actions ahead of time to cover otherwise unexpected risks. / Titeln på denna rapport är En jämförelse av maskininlärningstekniker för uppskattning av Probability of Impairment. Uppskattningen av Probability of Impairment sker genom identifikation av låntagare som inte kommer fullfölja sina återbetalningsskyldigheter inom ett år. Probability of Impairment, eller Probability of Default, är andelen kunder som uppskattas att inte fullfölja sina skyldigheter som låntagare och återbetalning därmed uteblir. Detta är ett nyckelmått inom banksektorn för att beräkna nivån av kreditrisk, vilken enligt nuvarande regleringsstandard uppskattas genom Linjär Regression. I denna uppsats visar vi hur detta mått istället kan uppskattas genom klassifikation med maskininlärning. Genom användandet av modeller anpassade för att hitta vilka specifika kunder som inte kommer fullfölja sina återbetalningsskyldigheter inom det kommande året, baserade på Neurala Nätverk och Gradient Boosting, visas att Probability of Impairment bättre uppskattas än genom Linjär Regression. Dessutom medför dessa modeller även ett stort antal interna användningsområden inom banksektorn. De nya variabler av intresse vi hittat kan användas för att stärka de modeller som idag används, samt förmågan att identifiera kunder som riskerar inte kunna fullfölja sina skyldigheter låter banker utföra nödvändiga åtgärder i god tid för att hantera annars oväntade risker. Classification Imbalanced Data Machine Learning Probability of Impairment Risk Management Klassificering Obalanserat Data Maskininlärning Probability of Impairment Riskhantering Mathematics Matematik
3	Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data Yan, Ping January 2019 (has links) Metadata refers to "data about data", which contains information needed to understand theprocess of data collection. In this thesis, we investigate if metadata features can be usedto detect broken data and how a tree-based interpretable machine learning algorithm canbe used for an effective classification. The goal of this thesis is two-fold. Firstly, we applya classification schema using metadata features for detecting broken data. Secondly, wegenerate the feature importance rate to understand the model’s logic and reveal the keyfactors that lead to broken data. The given task from the Swedish automotive company Veoneer is a typical problem oflearning from extremely imbalanced data set, with 97 percent of data belongs healthy dataand only 3 percent of data belongs to broken data. Furthermore, the whole data set containsonly categorical variables in nominal scales, which brings challenges to the learningalgorithm. The notion of handling imbalanced problem for continuous data is relativelywell-studied, but for categorical data, the solution is not straightforward. In this thesis, we propose a combination of tree-based supervised learning and hyperparametertuning to identify the broken data from a large data set. Our methods arecomposed of three phases: data cleaning, which is eliminating ambiguous and redundantinstances, followed by the supervised learning algorithm with random forest, lastly, weapplied a random search for hyper-parameter optimization on random forest model. Our results show empirically that tree-based ensemble method together with a randomsearch for hyper-parameter optimization have made improvement to random forest performancein terms of the area under the ROC. The model outperformed an acceptableclassification result and showed that metadata features are capable of detecting brokendata and providing an interpretable result by identifying the key features for classificationmodel. machine learning decision tree imbalanced data anomaly detection random forest maskininlärning beslut träd obalanserat data anomalitetsdetektering Probability Theory and Statistics Sannolikhetsteori och statistik

Search results

Multi-Class Classification for Predicting Customer Satisfaction : Application of machine learning methods to predict customer satisfaction at IKEA

Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data