Global ETD Search

1	Predictive analysis at Krononfogden : Classifying first-time debtors with an uplift model Rantzer, Måns January 2016 (has links) The use of predictive analysis is becoming more commonplace with each passing day, which lends increased credence to the fact that even governmental institutions should adopt it. Kronofogden is in the middle of a digitization process and is therefore in a unique position to implement predictive analysis into the core of their operations. This project aims to study if methods from predictive analysis can predict how many debts will be received for a first-time debtor, through the use of uplift modeling. The difference between uplift modeling and conventional modeling is that it aims to measure the difference in behavior after a treatment, in this case guidance from Kronofogden. Another aim of the project is to examine whether the scarce literature about uplift modeling have it right about how the conventional two-model approach fails to perform well in practical situations. The project shows similar results as Kronofogden’s internal evaluations. Three models were compared: random forests, gradient-boosted models and neural networks, the last performing the best. Positive uplift could be found for 1-5% of the debtors, meaning the current cutoff level of 15% is too high. The models have several potential sources of error, however: modeling choices, that the data might not be informative enough or that the actual expected uplift for new data is equal to zero. predictive analysis machine learning Kronofogden uplift uplift modeling
2	Inkrementell responsanalys : Vilka kunder bör väljas vid riktad marknadsföring? / Incremental response analysis : Which customers should be selected in direct marketing? Karlsson, Jonas, Karlsson, Roger January 2013 (has links) If customers respond differently to a campaign, it is worthwhile to find those customers who respond most positively and direct the campaign towards them. This can be done by using so called incremental response analysis where respondents from a campaign are compared with respondents from a control group. Customers with the highest increased response from the campaign will be selected and thus may increase the company’s return. Incremental response analysis is applied to the mobile operator Tres historical data. The thesis intends to investigate which method that best explain the incremental response, namely to find those customers who give the highest incremental response of Tres customers, and what characteristics that are important.The analysis is based on various classification methods such as logistic regression, Lassoregression and decision trees. RMSE which is the root mean square error of the deviation between observed and predicted incremental response, is used to measure the incremental response prediction error. The classification methods are evaluated by Hosmer-Lemeshow test and AUC (Area Under the Curve). Bayesian logistic regression is also used to examine the uncertainty in the parameter estimates.The Lasso regression performs best compared to the decision tree, the ordinary logistic regression and the Bayesian logistic regression seen to the predicted incremental response. Variables that significantly affect the incremental response according to Lasso regression are age and how long the customer had their subscription. Incremental response modeling uplift modeling database marketing Net information value Lasso regression Bayesian logistic regression decision trees logistic regression
3	Uplift Modeling : Identifying Optimal Treatment Group Allocation and Whom to Contact to Maximize Return on Investment Karlsson, Henrik January 2019 (has links) This report investigates the possibilities to model the causal effect of treatment within the insurance domain to increase return on investment of sales through telemarketing. In order to capture the causal effect, two or more subgroups are required where one group receives control treatment. Two different uplift models model the causal effect of treatment, Class Transformation Method, and Modeling Uplift Directly with Random Forests. Both methods are evaluated by the Qini curve and the Qini coefficient. To model the causal effect of treatment, the comparison with a control group is a necessity. The report attempts to find the optimal treatment group allocation in order to maximize the precision in the difference between the treatment group and the control group. Further, the report provides a rule of thumb that ensure that the control group is of sufficient size to be able to model the causal effect. If has provided the data material used to model uplift and it consists of approximately 630000 customer interactions and 60 features. The total uplift in the data set, the difference in purchase rate between the treatment group and control group, is approximately 3%. Uplift by random forest with a Euclidean distance splitting criterion that tries to maximize the distributional divergence between treatment group and control group performs best, which captures 15% of the theoretical best model. The same model manages to capture 77% of the total amount of purchases in the treatment group by only giving treatment to half of the treatment group. With the purchase rates in the data set, the optimal treatment group allocation is approximately 58%-70%, but the study could be performed with as much as approximately 97%treatment group allocation. Causal Effect Uplift Modeling Class Transformation Method Model Uplift Directly Random Forest XGBoost Qini Curve Qini Coefficient Optimal Control Group Allocation Probability Theory and Statistics Sannolikhetsteori och statistik
4	Statistical approaches to enhance decision support in time series and causality problems Bokelmann, Björn 11 November 2024 (has links) Prädiktive Modelle sind hilfreiche Mittel zur quantitativen Entscheidungsunterstützung von modernen Unternehmen. Jedoch gibt es in vielen Fällen statistische Probleme in den genutzten Daten, die eine wirkungsvolle Anwendung prädiktiver Modelle zur Entscheidungsunterstützung verhindern. In dieser Doktorarbeit werden solche häufig auftretenden statistischen Probleme analysiert und statistische Methoden werden vorgestellt, mit denen man diese Probleme überwinden und damit prädiktive Modellierung und Entscheidungsunterstützung wirkungsvoll machen kann. Der erste Teil der Arbeit behandelt das Problem von "Concept Drift" in Google Trends Zeitreihen. Die Doktorarbeit bietet eine empirische Analyse des Problems und einen Ansatz um die Daten zu bereinigen. Für den speziellen Anwendungsfall der Tourismusnachfragevorhersage in Deutschland demonstriert die Arbeit empirisch den Nutzen der Bereinigungsmethode. Der zweite Teil der Arbeit setzt sich mit Experimenten und Modellen zur Schätzung von heterogenen Behandlungseffekten von Individuen auseinander. In solchen Anwendungen stellt Rauschen (Noise) in den Daten eine statistische Herausforderung dar, die zu einer hohen benötigten Fallzahl im Experiment und unerwarteten negativen Folgen bei der anschließenden selektiven Vergabe der Behandlung führen kann. Um diese Probleme zu überwinden entwickelt die Arbeit Methoden um Experimente mit einer kleineren Fallzahl durchzuführen, ohne Einbußen in der Qualität der Ergebnisse zu erleiden. Darüber hinaus analysiert die Arbeit die potenziell negativen Folgen von Noise auf die selektive Behandlungsvergabe und schlägt Ideen vor, wie man diese verhindern kann. / Predictive models are useful methods for quantitative decision support in contemporary business. However, often there are statistical problems in the data sets, hindering effective predictive modeling and decision support. This thesis analyzes such frequently occurring statistical problems and provides statistical approaches to overcome them and thereby enable efficient predictive modeling and decision support. The first part of the thesis focuses on concept drift in Google Trends time series data. The thesis provides an empirical analysis of the problem and an approach to sanitize the data. For the specific use case of tourism demand forecasting in Germany, the thesis demonstrates the usefulness of the statistical approach. The second part of the thesis focuses on experiments and models to estimate heterogeneous treatment effects of individuals. In such applications, noise in the data poses a statistical challenge, leading to high requirements in the sample size for randomized experiments and potentially leading to unexpected negative results in the treatment allocation process. To overcome this problem, the thesis proposes methods to conduct experiments with a limited number of individuals, without impairing the decision support. Moreover, the thesis analyzes the potential adverse effects of noise on the treatment allocation process and provides ideas on how to prevent them. Zeitreihenvorhersage Kausalität Entscheidungsunterstützung Time Series Forecasting Uplift Modeling Causality Decision Support 330 Wirtschaft QH 230 QH 232 QH 244 QH 243 QH 250 ddc:330 ddc:000
5	Machine Learning Based Prediction and Classification for Uplift Modeling / Maskininlärningsbaserad prediktion och klassificering för inkrementell responsanalys Börthas, Lovisa, Krange Sjölander, Jessica January 2020 (has links) The desire to model the true gain from targeting an individual in marketing purposes has lead to the common use of uplift modeling. Uplift modeling requires the existence of a treatment group as well as a control group and the objective hence becomes estimating the difference between the success probabilities in the two groups. Efficient methods for estimating the probabilities in uplift models are statistical machine learning methods. In this project the different uplift modeling approaches Subtraction of Two Models, Modeling Uplift Directly and the Class Variable Transformation are investigated. The statistical machine learning methods applied are Random Forests and Neural Networks along with the standard method Logistic Regression. The data is collected from a well established retail company and the purpose of the project is thus to investigate which uplift modeling approach and statistical machine learning method that yields in the best performance given the data used in this project. The variable selection step was shown to be a crucial component in the modeling processes as so was the amount of control data in each data set. For the uplift to be successful, the method of choice should be either the Modeling Uplift Directly using Random Forests, or the Class Variable Transformation using Logistic Regression. Neural network - based approaches are sensitive to uneven class distributions and is hence not able to obtain stable models given the data used in this project. Furthermore, the Subtraction of Two Models did not perform well due to the fact that each model tended to focus too much on modeling the class in both data sets separately instead of modeling the difference between the class probabilities. The conclusion is hence to use an approach that models the uplift directly, and also to use a great amount of control data in each data set. / Behovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat. Uplift Modeling Data Pre-Processing Predictive Modeling Incremental Modeling Random Forests Logistic Regression Neural Networks Ensemble Methods Machine Learning Multi-Layer Perceptron Inkrementell responsanalys databehandling prediktiv modellering random forests logistisk regression neurala nätverk mulit-layer perceptron ensemble metoder maskininlärning Mathematics Matematik

1

Page generated in 0.0559 seconds