• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 25
  • 25
  • 13
  • 13
  • 11
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Predicting Customer Churn Using Recurrent Neural Networks / Prediktera kundbeteende genom användning av återkommande neurala nätverk

Ljungehed, Jesper January 2017 (has links)
Churn prediction is used to identify customers that are becoming less loyal and is an important tool for companies that want to stay competitive in a rapidly growing market. In retail, a dynamic definition of churn is needed to identify churners correctly. Customer Lifetime Value (CLV) is the monetary value of a customer relationship. No change in CLV for a given customer indicates a decrease in loyalty. This thesis proposes a novel approach to churn prediction. The proposed model uses a Recurrent Neural Network to identify churners based on Customer Lifetime Value time series regression. The results show that the model performs better than random. This thesis also investigated the use of the K-means algorithm as a replacement to a rule-extraction algorithm. The K-means algorithm contributed to a more comprehensive analytical context regarding the churn prediction of the proposed model. / Illojalitet prediktering används för att identifiera kunder som är påväg att bli mindre lojala och är ett hjälpsamt verktyg för att ett företag ska kunna driva en konkurrenskraftig verksamhet. I detaljhandel behöves en dynamisk definition av illojalitet för att korrekt kunna identifera illojala kunder. Kundens livstidsvärde är ett mått på monetärt värde av en kundrelation. En avstannad förändring av detta värde indikerar en minskning av kundens lojalitet. Denna rapport föreslår en ny metod för att utföra illojalitet prediktering. Den föreslagna metoden består av ett återkommande neuralt nätverk som används för att identifiera illojalitet hos kunder genom att prediktera kunders livstidsvärde. Resultaten visar att den föreslagna modellen presterar bättre jämfört med slumpmässig metod. Rapporten undersöker också användningen av en k-medelvärdesalgoritm som ett substitut för en regelextraktionsalgoritm. K-medelsalgoritm bidrog till en mer omfattande analys av illojalitet predikteringen.
12

Player Activity Sequence Analysis Using Process Mining : Player churn prediction and Abnormal player sequences detection using process mining on the data from a live game

Maragoni, Varun Goud January 2022 (has links)
Background: Game analytics is a field that aims to analyze games and help in the enhancement of game development. Data mining is a prominent technique for game analytics. Recent advances in the field of process mining have motivated users to apply process mining to real-world scenarios in order to derive process-oriented insights. In this study, We provide a discussion on how process mining can be used in game analytics. Objective: The goal of this study is to apply process mining to player data from a live game, analyze the results, and determine whether these results can be interpreted, whether we can derive any patterns or insights that can be useful for game designers, and whether process mining can be used in-game analytics and, if so, what kind of versatility it can offer. Also, this study provides approaches on how process mining can be used in player churn prediction and determination of abnormal player activity sequences. Method: Firstly, a literature review is performed to comprehend all of the process mining techniques and metrics used to evaluate the discovered process models. Then experiments are conducted by applying process mining on data from a live game, determine a churn predictor using process mining and determining a technique to identify abnormal player sequences. Results: Process discovery algorithms are applied on data from a live game, the results are analyzed. Several process models are discovered to identify player churn and it is compared with a baseline machine learning churn predictor trained on the same data to that of process mining. Abnormal player activity sequences of the gameare determined using process mining and compared with expected player sequences and analyzed with the help of game designers. Conclusion: Process mining can be utilized in game analytics to discover new process-oriented insights. When compared to typical data mining techniques, the results gained by process mining are more versatile. It also has other capabilities such as detecting unusual sequences in data.
13

Churn Prediction : Predicting User Churn for a Subscription-based Service using Statistical Analysis and Machine Learning Models

Flöjs, Amanda, Hägg, Alexandra January 2020 (has links)
Subscription-based services are becoming more popular in today’s society. Therefore, any company that engages in the subscription-based business needs to understand the user behavior and minimize the number of users canceling their subscription, i.e. minimize churn. According to marketing metrics, the probability of selling to an existing user is markedly higher than selling to a brand new user. Nonetheless, it is of great importance that more focus is directed towards preventing users from leaving the service, in other words preventing user churn. To be able to prevent user churn the company needs to identify the users in the risk zone of churning. Therefore, this thesis project will treat this as a classification problem. The objective of the thesis project was to develop a statistical model to predict churn for a subscription-based service. Various statistical methods were used in order to identify patterns in user behavior using activity and engagement data including variables describing recency, frequency, and volume. The best performing statistical model for predicting churn was achieved by the Random Forest algorithm. The selected model is able to separate the two classes of churning users and the non-churning users with 73% probability and has a fairly low missclassification rate of 35%. The results show that it is possible to predict user churn using statistical models. Although, there are indications that it is difficult for the model to generalize a specific behavioral pattern for user churn. This is understandable since human behavior is hard to predict. The results show that variables describing how frequent the user is interacting with the service are explaining the most whether a user is likely to churn or not. / Prenumerationstjänster blir alltmer populära i dagens samhälle. Därför är det viktigt för ett företag med en prenumerationsbaserad verksamhet att ha en god förståelse för sina användares beteendemönster på tjänsten, samt att de minskar antalet användare som avslutar sin prenumeration. Enligt marknads-föringsstatistik är sannolikheten att sälja till en redan existerande användare betydligt högre än att sälja till en helt ny. Av den anledningen, är det viktigt att ett stort fokus riktas mot att förebygga att användare lämnar tjänsten. För att förebygga att användare lämnar tjänsten måste företaget identifiera vilka användare som är i riskzonen att lämna. Därför har detta examensarbete behandlats som ett klassifikations problem. Syftet med arbetet var att utveckla en statistisk modell för att förutspå vilka användare som sannolikt kommer att lämna prenumerationstjänsten inom nästa månad. Olika statistiska metoder har prövats för att identifiera användares beteendemönster i aktivitet- och engagemangsdata, data som inkluderar variabler som beskriver senaste interaktion, frekvens och volym. Bäst prestanda för att förutspå om en användare kommer att lämna tjänsten gavs av Random Forest algoritmen. Den valda modellen kan separera de två klasserna av användare som lämnar tjänsten och de användare som stannar med 73% sannolikhet och har en relativt låg missfrekvens på 35%. Resultatet av arbetet visar att det går att förutspå vilka användare som befinner sig i riskzonen för att lämna tjänsten med hjälp av statistiska modeller, även om det är svårt för modellen att generalisera ett specifikt beteendemönster för de olika grupperna. Detta är dock förståeligt då det är mänskligt beteende som modellen försöker att förutspå. Resultatet av arbetet pekar mot att variabler som beskriver frekvensen av användandet av tjänsten beskriver mer om en användare är påväg att lämna tjänsten än variabler som beskriver användarens aktivitet i volym.
14

Reálná úloha dobývání znalostí / Actual role of knowledge discovery in databases

Pešek, Jiří January 2012 (has links)
The thesis "Actual role of knowledge discovery in databases˝ is concerned with churn prediction in mobile telecommunications. The issue is based on real data of a telecommunication company and it covers all steps of data mining process. In accord with the methodology CRISP-DM, the work looks thouroughly at the following stages: business understanding, data understanding, data preparation, modeling, evaluation and deployment. As far as a system for knowledge discovery in databases is concerned, the tool IBM SPSS Modeler was selected. The introductory chapter of the theoretical part familiarises the reader with the issue of so called churn management, which comprises the given assignment; the basic concepts related to data mining are defined in the chapter as well. The attention is also given to the basic types of tasks of knowledge discovery of databasis and algorithms that are pertinent to the selected assignment (decision trees, regression, neural network, bayesian network and SVM). The methodology describing phases of knowledge discovery in databases is included in a separate chapter, wherein the methodology of CRIPS-DM is examined in greater detail, since it represents the foundation for the solution of our practical assignment. The conclusion of the theoretical part also observes comercial or freely available systems for knowledge discovery in databases.
15

Time-Series Classification: Technique Development and Empirical Evaluation

Yang, Ching-Ting 31 July 2002 (has links)
Many interesting applications involve decision prediction based on a time-series sequence or a set of time-series sequences, which are referred to as time-series classification problems. Past classification analysis research predominately focused on constructing a classification model from training instances whose attributes are atomic and independent. Direct application of traditional classification analysis techniques to time-series classification problems requires the transformation of time-series data into non-time-series data attributes by applying some statistical operations (e.g., average, sum, etc). However, such statistical transformation often results in information loss. In this thesis, we proposed the Time-Series Classification (TSC) technique, based on the nearest neighbor classification approach. The result of empirical evaluation showed that the proposed time-series classification technique had better performance than the statistical-transformation-based approach.
16

Trajectory-based methods to predict user churn in online health communities

Joshi, Apoorva 01 May 2018 (has links)
Online Health Communities (OHCs) have positively disrupted the modern global healthcare system as patients and caregivers are interacting online with similar peers to improve quality of their life. Social support is the pillar of OHCs and, hence, analyzing the different types of social support activities contributes to a better understanding and prediction of future user engagement in OHCs. This thesis used data from a popular OHC, called Breastcancer.org, to first classify user posts in the community into the different categories of social support using Word2Vec for language processing and six different classifiers were explored, resulting in the conclusion that Random Forest was the best approach for classification of the user posts. This exercise helped identify the different types of social support activities that users participate in and also detect the most common type of social support activity among users in the community. Thereafter, three trajectory-based methods were proposed and implemented to predict user churn (attrition) from the OHC. Comparison of the proposed trajectory-based methods with two non-trajectory-based benchmark methods helped establish that user trajectories, which represent the month-to-month change in the type of social support activity of users are effective pointers for user churn from the community. The results and findings from this thesis could help OHC managers better understand the needs of users in the community and take necessary steps to improve user retention and community management.
17

Comparison of Machine learningalgorithms on Predicting Churn withinMusic streaming service

Gaddam, Lahari, Kadali, Sree Lakshmi Hiranmayee January 2022 (has links)
Background: Customer churn prediction is one of the most popular part of bigbusinesses and often help the companies in customer retention and revenue generation.Customer churn may lead to huge loss of revenue and is important to analyzeand determine the cause for churn. Moreover, it is easier to retain an existing customerrather than acquiring new clients.Therefore, to get a better understanding onchurn prediction, this research work focuses on finding the best performing machinelearning model after effective comparision among four machine learning models. Theresearch also gives a brief report of latest literature work done in churn analysis ofmusic streaming services. Objectives: In this thesis work, we aim to research about churn prediction done inmusic streaming services. We focus on two main objectives, first one includes literaturereview on the latest research work done in churn prediction of music streamingservices. Secondly, we aim in comparing the performance of four supervised machinelearning algorithms, to find out the best performing algorithm for churn prediction. Methods: This thesis involves two methods literature review and experimentationto answer our research questions. We chose to use literature review for RQ1 soit can give a better understanding on our selected problem and works as base workfor our research and helps in clear and better comprehension. Experimentation ischosen for RQ2 to to build and train the selected machine learning model to validatethe performance of algorithms. Experimentation is chosen because it gives betterresults and prediction compared to surveys and reviews. Results: We have selected four classification supervised machine learning algorithmsnamely, Logistic regression, Naive Bayes, KNN, and RF in this research.Upon experimentation and training the models using the algorithms with a preprocessingthe KKBox’s dataset, RF achieved highest accuracy of 97% compared toother models. Conclusions: We have trained four models using the four machine learning algorithmsfor the prediction of churn in music streaming service domain. Upon trainingthe models with the KKBox’s dataset and upon experimentation, we came to a conclusionthat RF has the best performance with better accuracy and AUC score.
18

Churnprediktion baserat på kundens första köp / Churn prediction based on the customer's first purchase

Ivarsson Orrelid, Christoffer, Pettersson, Oskar, Thornander, Jonathan January 2022 (has links)
Många företag drabbas regelbundet av churn, ett tillstånd som innebär att existerande kunder slutar handla hos företaget eller använda företagets tjänster för att istället vända sig till konkurrenter. För att säkerställa lojalitet bland kunderna behöver företag därför etablera metoder för att tidigt vinna kundens tillit. Med hjälp av maskininlärning kan processen att identifiera churn automatiseras, så kallad churnprediktion. Mycket forskning finns kring churnprediktion, framförallt inom telekomsektorn och inom företag som erbjuder prenumerationstjänster. Majoriteten av tidigare exempel bygger dock på kunddata som samlats in från flera tidpunkter och syftar till att predicera churn inom en längre tidsperiod, vanligtvis inom ett år. Det finns färre exempel inom kontexten e-handeln, samt forskning om hur maskininlärning kan tillämpas för att enbart utifrån data från kundens första köp och inom en kortare tidsperiod identifiera churn. I denna studie har två maskininlärningsmodeller utvecklats baserat på Random Forest-algoritmen och Logistisk Regression-algoritmen. Syftet var att undersöka vilken algoritm som är bäst lämpad för att predicera om en given kund kommer handla igen eller inte inom en tremånadersperiod, enbart med data från kundens första köp. Undersökningen baserades på data från ett svenskt e-handelsföretag. Modellerna utvärderades med mått för klassificeringsproblem, bland annat Cohen’s kappa och AUC. Trots att Logistisk regression visar sig prestera något bättre tyder resultaten på att båda modellerna har generellt svårt att avgöra om kunden kommer utsätta företaget för churn eller ej. En möjlig förklaring anses vara datamängdens restriktivitet som endast innehåller data från kundens första köp. Däremot konstateras båda modellernas möjlighet att filtrera ut kunder som löper hög risk att utsätta företaget för churn, där Random Forest visar sig vara något bättre på detta. Slutligen konstaterades att modellerna inte påvisar kraftig förbättring jämfört med en naiv lösning där alla kunder antas utsätta företaget för churn, men eftersom även små förbättringar innebär att företaget kan spara pengar kan dock modellernas användbarhet motiveras. / Companies are continuously affected by churn, a condition where existing customers turn to competitors instead using the company’s services. To ensure customer loyalty, it is vital for the company to establish methods to gain the customers trust early on. With the help of machine learning, the process for identifying churn can be automated, known as churn prediction. Research on churn prediction is abundant, especially concerning the telecom sector and subscription-based services. Most of these articles, however, are based on additional, historical data surrounding the customer, aiming to predict churn within a longer time frame, usually a year. The articles focusing on e-commerce, combined with how machine learning can be applied to identify churn within a short period, based solely on data from the customer’s first purchase, are scarce. Two machine learning models are developed based on the Random Forest-algorithm and the Logistic Regression-algorithm. These are tested to see which algorithm is best suited for predicting whether a given customer will buy again or not within a three-month period, with only data from the customer's first purchase from a Swedish e-commerce company. The models were then evaluated with classification metrics, including Cohen’s kappa and AUC. Despite the fact that Logistic Regression performed slightly better, the results showed that both models struggled with the churn prediction. A possible explanation is the restrictiveness of the data set. However, with the option of changing the calibration points on the models’ confidence, allowing the filtration of customers who have a greater chance of leading to churn, both models performed better with Random Forest being slightly superior. The models are considered a slight improvement to a naïve solution where all customers are treated as possible churn. They are also useful given the context, where even minor prevention of churn can lead to profit for the company.
19

Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies

Kulhavý, Lukáš January 2010 (has links)
This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
20

Predicting Customer Churn in E-commerce Using Statistical Modeling and Feature Importance Analysis : A Comparison of Random Forest and Logistic Regression Approaches

Rudälv, Amanda January 2023 (has links)
While operating in online markets offers opportunities for expanded assortment and convenience, it also poses challenges such as increased competition and the need to build personal relationships with customers. Customer retention be- comes crucial in maintaining a successful business, emphasizing the importance of understanding customer behavior. Traditionally, customer behavior analysis has focused on transactional behavior, such as purchase frequency and spending amounts. However, there has been a shift towards non-transactional behavior, driven by the popularity of loyalty programs that reward customers beyond trans- actions and aim to make customers feel appreciated and included, regardless of their spending power. This study is conducted at a global retailer with the aim of enhancing the under- standing of how non-transactional customer behavior influences customer churn. The approach in this study is to understand such behavior by developing a statis- tical model and to analyze statistical approaches of feature importance. Two types of approaches for statistical modeling, each with four variations, are assessed: (1) Random forest; and (2) Logistic regression. Furthermore, three different feature importance methods are considered; (1) Gini importance; (2) Permutation impor- tance and (3) Coefficient importance. The results showed that this approach can be used to analyze customer behavior and gain a better understanding of the driving factors for churn. Furthermore, the results showed that random forest approaches outperform logistic regression. With the definition of churn constructed in this study, the most important factors that affect the probability of churn are the customer’s number of sessions and inter session interval. / Att bedriva e-handel erbjuder inte enbart möjligheter för utökat sortiment och bekvämlighet, utan leder även till ökad konkurrens och ett ökat behov av att bygga relationer med kunder. Kundlojalitet är därmed avgörande för att upprätthålla en framgångsrik verksamhet, och betonar vikten av att förstå kundernas beteende. Traditionellt har analyser av kundbeteende främst bedrivits med fokus på transak- tionellt beteende, såsom frekvens eller totalbelopp för köp. På senare tid har allt mer fokus lagts på icke-transaktionellt beteende, på grund av införandet av lo- jalitetsprogram som belönar kunder bortom transaktioner, med målet att kunder ska känna sig uppskattade och inkluderade, oavsett köpkraft. Denna studie genomförs hos ett globalt detaljhandelsföretag med målet att utöka förståelsen för hur icke-transaktionellt kundbeteende påverkar kundbortfall. För att uppnå detta konstrueras en statistisk modell som utnyttjas för att med hjälp av statistiska metoder analysera signifikans hos variabler. Två kategorier av statis- tiska modeller undersöks; (1) Random forest och (2) Logistisk regression. Utöver detta används tre olika metoder för att analysera signifikans hos variabler; (1) Gini-betydelse; (2) Permutationsbetydelse; och (3) Koefficientbetydelse. Resultatet visar att studiens tillvägagångssätt kan användas för att analysera kund- beteende och nå ökad förståelse för vad som driver kundbortfall. Vidare visar re- sultatet att random forest-modeller överträffar modeller baserade på logistisk re- gression. Baserat på den definition av kundbortfall som definierats i denna studie är de viktigaste faktorerna som påverkar sannolikheten för kundbortfall, kundens antal sessioner och intervallet mellan kundens sessioner.

Page generated in 0.1144 seconds