Global ETD Search

1	Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods Alstermark, Olivia, Stolt, Evangelina January 2021 (has links) When a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial purchase. The aim of this master thesis is to investigate the possibility of using machine learning techniques to predict the likelihood of a new customer returning for a second purchase within a certain time frame. To investigate to what degree machine learning techniques can be used to predict probability of return, a number of di↵erent model setups of Logistic Lasso, Support Vector Machine and Extreme Gradient Boosting are tested. Model development is performed to ensure well-calibrated probability predictions and to possibly overcome the diculty followed from an imbalanced ratio of returning and non-returning customers. Throughout the thesis work, a number of actions are taken in order to account for data protection. One such action is to add noise to the response feature, ensuring that the true fraction of returning and non-returning customers cannot be derived. To further guarantee data protection, axes values of evaluation plots are removed and evaluation metrics are scaled. Nevertheless, it is perfectly possible to select the superior model out of all investigated models. The results obtained show that the best performing model is a Platt calibrated Extreme Gradient Boosting model, which has much higher performance than the other models with regards to considered evaluation metrics, while also providing predicted probabilities of high quality. Further, the results indicate that the setups investigated to account for imbalanced data do not improve model performance. The main con- clusion is that it is possible to obtain probability predictions of high quality for new customers returning to a company for a second purchase within a certain time frame, using machine learning techniques. This provides a powerful tool for a company when evaluating potential prospects. Purchase Probability Prediction Machine Learning Models Well-Calibrated Probabilities Imbalanced Data Data Protection Mathematics Matematik
2	Venn Prediction for Survival Analysis : Experimenting with Survival Data and Venn Predictors Aparicio Vázquez, Ignacio January 2020 (has links) The goal of this work is to expand the knowledge on the field of Venn Prediction employed with Survival Data. Standard Venn Predictors have been used with Random Forests and binary classification tasks. However, they have not been utilised to predict events with Survival Data nor in combination with Random Survival Forests. With the help of a Data Transformation, the survival task is transformed into several binary classification tasks. One key aspect of Venn Prediction are the categories. The standard number of categories is two, one for each class to predict. In this work, the usage of ten categories is explored and the performance differences between two and ten categories are investigated. Seven data sets are evaluated, and their results presented with two and ten categories. For the Brier Score and Reliability Score metrics, two categories offered the best results, while Quality performed better employing ten categories. Occasionally, the models are too optimistic. Venn Predictors rectify this performance and produce well-calibrated probabilities. / Målet med detta arbete är att utöka kunskapen om området för Venn Prediction som används med överlevnadsdata. Standard Venn Predictors har använts med slumpmässiga skogar och binära klassificeringsuppgifter. De har emellertid inte använts för att förutsäga händelser med överlevnadsdata eller i kombination med Random Survival Forests. Med hjälp av en datatransformation omvandlas överlevnadsprediktion till flera binära klassificeringsproblem. En viktig aspekt av Venn Prediction är kategorierna. Standardantalet kategorier är två, en för varje klass. I detta arbete undersöks användningen av tio kategorier och resultatskillnaderna mellan två och tio kategorier undersöks. Sju datamängder används i en utvärdering där resultaten presenteras för två och tio kategorier. För prestandamåtten Brier Score och Reliability Score gav två kategorier de bästa resultaten, medan för Quality presterade tio kategorier bättre. Ibland är modellerna för optimistiska. Venn Predictors korrigerar denna prestanda och producerar välkalibrerade sannolikheter. Computer and Information Sciences Data- och informationsvetenskap

Search results

Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods

Venn Prediction for Survival Analysis : Experimenting with Survival Data and Venn Predictors