Spelling suggestions: "subject:"churn aprediction"" "subject:"churn iprediction""
1 |
Customer Churn Predictive Heuristics from Operator and Users' PerspectiveMOUNIKA REDDY, CHANDIRI January 2016 (has links)
Telecommunication organizations are confronting in expanding client administration weight as they launch various user-desired services. Conveying poor client encounters puts client connections and incomes at danger. One of the metrics used by telecommunications companies to determine their relationship with customers is “Churn”. After substantial research in the field of churn prediction over many years, Big Data analytics with Data Mining techniques was found to be an efficient way for identifying churn. These techniques are usually applied to predict customer churn by building models, pattern classification and learning from historical data. Although some work has already been undertaken with regards to users’ perspective, it appears to be in its infancy. The aim of this thesis is to validate churn predictive heuristics from the operator perspective and close to user end. Conducting experiments with different sections of people regarding their data usage, designing a model, which is close to the user end and fitting with the data obtained through the survey done. Correlating the examined churn indicators and their validation, validation with the traffic volume variation with the users’ feedback collected by accompanying theses. A Literature review is done to analyze previous works and find out the difficulties faced in analyzing the users’ feeling, also to understand methodologies to get around problems in handling the churn prediction algorithms accuracy. Experiments are conducted with different sections of people across the globe. Their experiences with quality of calls, data and if they are looking to change in future, what would be their reasons of churn be, are analyzed. Their feedback will be validated using existing heuristics. The collected data set is analyzed by statistical analysis and validated for different datasets obtained by operators’ data. Also statistical and Big Data analysis has been done with data provided by an operator’s active and churned customers monthly data volume usage. A possible correlation of the user churn with users’ feedback will be studied by calculating the percentages and further correlate the results with that of the operators’ data and the data produced by the mobile app. The results show that the monthly volumes have not shown much decision power and the need for additional attributes such as higher time resolution, age, gender and others are needed. Whereas the survey done globally has shown similarities with the operator’s customers’ feedback and issues “around the globe” such a data plan issues, pricing, issues with connectivity and speed. Nevertheless, data preprocessing and feature selection has shown to be the key factors. Churn predictive models have given a better classification of 69.7 % when more attributes were provided. Telecom Operators’ data classification have given an accuracy of 51.7 % after preprocessing and for the variables we choose. Finally, a close observation of the end user revealed the possibility to yield a much higher classification precision of 95.2 %.
|
2 |
Churn PredictionÅkermark, Alexander, Hallefält, Mattias January 2019 (has links)
Churn analysis is an important tool for companies as it can reduce the costs that are related to customer churn. Churn prediction is the process of identifying users before they churn, this is done by implementing methods on collected data in order to find patterns that can be helpful when predicting new churners in the future.The objective of this report is to identify churners with the use of surveys collected from different golfclubs, their members and guests. This was accomplished by testing several different supervised machine learning algorithms in order to find the different classes and to see which supervised algorithms are most suitable for this kind of data.The margin of success was to have a greater accuracy than the percentage of major class in the datasetThe data was processed using label encoding, ONE-hot encoding and principal component analysis and was split into 10 folds, 9 training folds and 1 testing fold ensuring cross validation when iterated 10 times rearranging the test and training folds. Each algorithm processed the training data to create a classifier which was tested on the test data.The classifiers used for the project was K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees and random forest.The different classifiers generally had an accuracy of around 72% and the best classifier which was random forest had an accuracy of 75%. All the classifiers had an accuracy above the margin of success.K-folding, confusion-matrices, classification report and other internal crossvalidation techniques were performed on the the data to ensure the quality of the classifier.The project was a success although there is a strong belief that the bottleneck for the project was the quality of the data in terms of new legislation when collecting and storing data that results in redundant and faulty data. / Churn analys är ett viktigt verktyg för företag då det kan reducera kostnaderna som är relaterade till kund churn. Churn prognoser är processen av att identifiera användare innan de churnas, detta är gjort med implementering av metoder på samlad data för att hitta mönster som är hjälpsamma när framtida användare ska prognoseras. Objektivet med denna rapport är att identifiera churnare med användning av enkäter samlade från golfklubbar och deras kunder och gäster. Det är uppnå att igenom att testa flera olika kontrollerade maskinlärnings algoritmer för att jämföra vilken algoritm som passar bäst. Felmarginalen uppgick till att ha en större träffsäkerhet än procenthalten av den dominanta klassen i datasetet. Datan behandlades med label encoding, ONE-hot encoding och principial komponent analys och delades upp i 10 delar, 9 träning och 1 test del för att säkerställa korsvalidering. Varje algoritm behandlade träningsdatan för att skapa att klassifierare som sedan testades på test datan. Klassifierarna som användes för projekted innefattar K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees och random forest. De olika klassifierarna hade en generell träffssäkerhet omkring 72%, där den bästa var random forest med en träffssäkerhet på 75%. Alla klassifierare hade en träffsäkerhet än den felmarginal som st¨alldes. K-folding, confusion matrices, classification report och andra interna korsvaliderings tekniker användes för att säkerställa kvaliteten på klassifieraren. Projektet var lyckat, men det finns misstanke om att flaskhalsen för projektet låg inom kvaliteten på datan med hänsyn på villkor för ny lagstiftning vid insamling och lagring av data som leder till överflödiga och felaktiga uppgifter.
|
3 |
Customer Churn Prediction Using Big Data AnalyticsTANNEEDI, NAREN NAGA PAVAN PRITHVI January 2016 (has links)
Customer churn is always a grievous issue for the Telecom industry as customers do not hesitate to leave if they don’t find what they are looking for. They certainly want competitive pricing, value for money and above all, high quality service. Customer churning is directly related to customer satisfaction. It’s a known fact that the cost of customer acquisition is far greater than cost of customer retention, that makes retention a crucial business prototype. There is no standard model which addresses the churning issues of global telecom service providers accurately. BigData analytics with Machine Learning were found to be an efficient way for identifying churn. This thesis aims to predict customer churn using Big Data analytics, namely a J48 decision tree on a Java based benchmark tool, WEKA. Three different datasets from various sources were considered; first includes Telecom operator’s six month aggregate active and churned users’ data usage volumes, second includes globally surveyed data and third dataset comprises of individual weekly data usage analysis of 22 android customers along with their average quality, annoyance and churn scores by accompanying theses. Statistical analyses and J48 Decision trees were drawn for three different datasets. From the statistics of normalized volumes, autocorrelations were small owing to reliable confidence intervals, but confidence intervals were overlapping and close by, therefore no much significance could be noticed, henceforth no strong trends could be observed. From decision tree analytics, decision trees with 52%, 70% and 95% accuracies were achieved for three different data sources respectively. Data preprocessing, data normalization and feature selection have shown to be prominently influential. Monthly data volumes have not shown much decision power. Average Quality, Churn Risk and to some extent, Annoyance scores may point out a probable churner. Weekly data volumes with customer’s recent history and necessary attributes like age, gender, tenure, bill, contract, data plan, etc., are pivotal for churn prediction.
|
4 |
Proposta para previsão de evasão baseada em padrões de acesso de usuários em jogos online. / Proposal for churn prediction based on online games users\' access patterns.Castro, Emiliano Gonçalves de 24 May 2011 (has links)
O mercado de jogos eletrônicos online tem crescido em ritmo acelerado nos últimos anos, particularmente a partir do surgimento do modelo de negócio baseado em serviços. Como consequência, as publicadoras destes jogos passaram a compartilhar problemas comuns na área de serviços, como a erosão do lucro causada pela evasão de usuários. Modelos preditivos têm sido utilizados no combate à evasão em mercados como os de telefonia móvel e de cartões de crédito, setores que detêm um grande volume de informações demográficas e econômicas a respeito dos seus consumidores. Já os publicadores de jogos muitas vezes só possuem o endereço eletrônico dos jogadores. O objetivo deste trabalho é propor um modelo de previsão de evasão com base exclusivamente nos padrões de acesso de usuários em jogos online, onde estes registros temporais são submetidos a um conjunto de operadores que analisam os dados no domínio do plano tempo-frequência, utilizando a Transformada Discreta de Wavelet. Sua principal contribuição está na proposta de parametrização dos dados de entrada para classificadores probabilísticos baseados no algoritmo k-Nearest Neighbors. Testados com dados reais de acessos de usuários ao longo de alguns meses em um jogo online, os classificadores foram avaliados com o uso de curvas ROC (Receiver Operating Characteristic) e de elevação. A abordagem proposta nesta tese, baseada na análise no domínio do plano tempo-frequência, apresentou resultados satisfatórios. Não apenas superiores se comparados com as abordagens no domínio do tempo ou da frequência, mas também comparáveis aos desempenhos encontrados por modelos com centenas de variáveis preditivas utilizados em outros mercados. / The online gaming market has rapidly grown in recent years, particularly since the rise of the service-based business model. As a result, the publishers of these games have started to share usual problems from the services business, like the profit erosion caused by customer churn. Predictive models have been used to address the churn problem in the mobile phones and credit cards markets, where companies have a huge volume of demographic and economic data about their customers. While game publishers often have only their users email addresses. The goal of this study is to propose a model for churn prediction based solely on the online games users access patterns, where these time entries are fed into a set of operators that are able to analyze the data in the time-frequency plane domain, using the Discrete Wavelet Transform. Its main contribution is the input data parameterization proposed for the probabilistic classifiers based on the k-Nearest Neighbors algorithm. Tested with real data from an online game users access over a few months, the classifiers were evaluated using ROC (Receiver Operating Characteristic) and lift curves. The approach proposed in this thesis, based on the analysis of the time-frequency plane domain, has shown satisfactory results. Not only higher when compared with approaches based on both time or frequency domains, but also comparable to performances found on models with hundreds of predictive variables used in other markets.
|
5 |
Proposta para previsão de evasão baseada em padrões de acesso de usuários em jogos online. / Proposal for churn prediction based on online games users\' access patterns.Emiliano Gonçalves de Castro 24 May 2011 (has links)
O mercado de jogos eletrônicos online tem crescido em ritmo acelerado nos últimos anos, particularmente a partir do surgimento do modelo de negócio baseado em serviços. Como consequência, as publicadoras destes jogos passaram a compartilhar problemas comuns na área de serviços, como a erosão do lucro causada pela evasão de usuários. Modelos preditivos têm sido utilizados no combate à evasão em mercados como os de telefonia móvel e de cartões de crédito, setores que detêm um grande volume de informações demográficas e econômicas a respeito dos seus consumidores. Já os publicadores de jogos muitas vezes só possuem o endereço eletrônico dos jogadores. O objetivo deste trabalho é propor um modelo de previsão de evasão com base exclusivamente nos padrões de acesso de usuários em jogos online, onde estes registros temporais são submetidos a um conjunto de operadores que analisam os dados no domínio do plano tempo-frequência, utilizando a Transformada Discreta de Wavelet. Sua principal contribuição está na proposta de parametrização dos dados de entrada para classificadores probabilísticos baseados no algoritmo k-Nearest Neighbors. Testados com dados reais de acessos de usuários ao longo de alguns meses em um jogo online, os classificadores foram avaliados com o uso de curvas ROC (Receiver Operating Characteristic) e de elevação. A abordagem proposta nesta tese, baseada na análise no domínio do plano tempo-frequência, apresentou resultados satisfatórios. Não apenas superiores se comparados com as abordagens no domínio do tempo ou da frequência, mas também comparáveis aos desempenhos encontrados por modelos com centenas de variáveis preditivas utilizados em outros mercados. / The online gaming market has rapidly grown in recent years, particularly since the rise of the service-based business model. As a result, the publishers of these games have started to share usual problems from the services business, like the profit erosion caused by customer churn. Predictive models have been used to address the churn problem in the mobile phones and credit cards markets, where companies have a huge volume of demographic and economic data about their customers. While game publishers often have only their users email addresses. The goal of this study is to propose a model for churn prediction based solely on the online games users access patterns, where these time entries are fed into a set of operators that are able to analyze the data in the time-frequency plane domain, using the Discrete Wavelet Transform. Its main contribution is the input data parameterization proposed for the probabilistic classifiers based on the k-Nearest Neighbors algorithm. Tested with real data from an online game users access over a few months, the classifiers were evaluated using ROC (Receiver Operating Characteristic) and lift curves. The approach proposed in this thesis, based on the analysis of the time-frequency plane domain, has shown satisfactory results. Not only higher when compared with approaches based on both time or frequency domains, but also comparable to performances found on models with hundreds of predictive variables used in other markets.
|
6 |
Predicting user churn on streaming services using recurrent neural networks / Förutsägande av användarens avbrott på strömmande tjänster med återkommande neurala nätverkMartins, Helder January 2017 (has links)
Providers of online services have witnessed a rapid growth of their user base in the last few years. The phenomenon has attracted an increasing number of competitors determined on obtaining their own share of the market. In this context, the cost of attracting new customers has increased significantly, raising the importance of retaining existing clients. Therefore, it has become progressively more important for the companies to improve user experience and ensure they keep a larger share of their users active in consuming their product. Companies are thus compelled to build tools that can identify what prompts customers to stay and also identify the users intent on abandoning the service. The focus of this thesis is to address the problem of predicting user abandonment, also known as "churn", and also detecting motives for user retention on data provided by an online streaming service. Classical models like logistic regression and random forests have been used to predict the churn probability of a customer with a fair amount of precision in the past, commonly by aggregating all known information about a user over a time period into a unique data point. On the other hand, recurrent neural networks, especially the long short-term memory (LSTM) variant, have shown impressive results for other domains like speech recognition and video classification, where the data is treated as a sequence instead. This thesis investigates how LSTM models perform for the task of predicting churn compared to standard nonsequential baseline methods when applied to user behavior data of a music streaming service. It was also explored how different aspects of the data, like the distribution between the churning and retaining classes, the size of user event history and feature representation influences the performance of predictive models. The obtained results show that LSTMs has a comparable performance to random forest for churn detection, while being significantly better than logistic regression. Additionally, a framework for creating a dataset suitable for training predictive models is provided, which can be further explored as to analyze user behavior and to create retention actions that minimize customer abandonment. / Leverantörer av onlinetjänster har bevittnat en snabb användartillväxt under de senaste åren. Denna trend har lockat ett ökande antal konkurrenter som vill ta del av denna växande marknad. Detta har resulterat i att kostnaden för att locka nya kunder ökat avsevärt, vilket även ökat vikten av att behålla befintliga kunder. Det har därför gradvis blivit viktigare för företag att förbättra användarupplevelsen och se till att de behåller en större andel avanvändarna aktiva. Företag har därför ett starkt intresse avatt bygga verktyg som kan identifiera vad som driver kunder att stanna eller vad som får dem lämna. Detta arbete fokuserar därför på hur man kan prediktera att en användare är på väg att överge en tjänst, så kallad “churn”, samt identifiera vad som driver detta baserat på data från en onlinetjänst. Klassiska modeller som logistisk regression och random forests har tidigare använts på aggregerad användarinformation över en given tidsperiod för att med relativt god precision prediktera sannolikheten för att en användare kommer överge produkten. Under de senaste åren har dock sekventiella neurala nätverk (särskilt LSTM-varianten Long Short Term Memory), där data istället behandlas som sekvenser, visat imponerande resultat för andra domäner såsom taligenkänning och videoklassificering. Detta arbete undersöker hur väl LSTM-modeller kan användas för att prediktera churn jämfört med traditionella icke-sekventiella metoder när de tillämpas på data över användarbeteende från en musikstreamingtjänst. Arbetet undersöker även hur olika aspekter av data påverkar prestandan av modellerna inklusive distributionen mellan gruppen av användare som överger produkten mot de som stannar, längden av användarhändelseshistorik och olika val av användarfunktioner för modeller och användardatan. De erhållna resultaten visar att LSTM har en jämförbar prestanda med random forest för prediktering av användarchurn samt är signifikant bättre än logistisk regression. LSTMs visar sig således vara ett lämpligt val för att förutsäga churn på användarnivå. Utöver dessa resultat utvecklades även ett ramverk för att skapa dataset som är lämpliga för träning av prediktiva modeller, vilket kan utforskas ytterligare för att analysera användarbeteende och för att skapa förbättrade åtgärder för att behålla användare och minimera antalet kunder som överger tjänsten.
|
7 |
Telecommunications Data Mining for Churn PredictionChiu, I-Tang 06 August 2001 (has links)
Abstract
As deregulation and new competitors open up the telecommunications industry, the cellular phone market has become more competitive than ever. To survive or maintain an advantage in such a competitive marketplace, many telecommunications companies are turning to data mining techniques to resolve such challenging issues as fraud detection, customer retention, and prospect profiling. In this thesis, we focused on developing and applying data mining technique to support the churn prediction. Constrained by limited customer profiles and general demographics, the proposed approach applied a decision tree induction technique (i.e., C4.5) to discover a classification model for churn predication solely based on the call records. To deal with the training data with a highly skewed distribution on decisions (i.e., around 2% churners and 98% non-churners), a multi-expert strategy was adopted. The empirical results showed that the proposed technique was effective in predicting at-risk cellular phone customers (i.e., potential churners). The proposed technique could identify 50.64% churners by selecting 10.03% of the population, and 68.62% churners by selecting 29.00% of the population.
|
8 |
CUSTOMER CHURN PREDICTION MODEL IN TELECOMMUNICATION SECTOR USING MACHINELEARNING TECHNIQUETaskin, Nayema January 2023 (has links)
Customer churn is a critical problem faced by telecom companies, leading to lost revenue and increased marketing costs. In the highly competitive telecommunication sector, customer retention is essential for success. It costs five to seven times more toacquire a new customer than it does to retain an existing one. Considering this, churnprediction models are increasingly becoming an important tool for telecommunicationorganizations looking to minimize their customer attrition rate. Churn, or customer attrition, is a major problem for businesses in the telecommunications sector. Every year,millions of customers switch to new service providers, resulting in billions of dollarsin lost revenue. In the ever- evolving and highly competitive world of telecommunications, businesses are constantly looking for new ways to improve customer loyaltyand reduce customer churn. Machine learning techniques can be incredibly useful inthis endeavor. This study proposes a customer churn prediction model using machinelearning techniques to help telecom companies retain customers and reduce churn rates.The proposed model analyzes big data using machine learning algorithms, including KNearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR),Random Forest (RF), Adaboost, Light Gradient Boosting Machine (LGBM), GradientBoosting, and Extreme Gradient Boosting (XGBoost) to predict customer churn. The proposed model achieves high accuracy score of 95.74% with the XGBoost and LGBMclassifier. The results demonstrate that machine learning algorithms have the potentialto predict customer churn effectively and provide insights into the primary drivers ofcustomer churn.
|
9 |
Enhancing Telecom Churn Prediction: Adaboost with Oversampling and Recursive Feature Elimination ApproachTran, Long Dinh 01 June 2023 (has links) (PDF)
Churn prediction is a critical task for businesses to retain their valuable customers. This paper presents a comprehensive study of churn prediction in the telecom sector using 15 approaches, including popular algorithms such as Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, and AdaBoost.
The study is segmented into three sets of experiments, each focusing on a different approach to building the churn prediction model. The model is constructed using the original training set in the first set of experiments. The second set involves oversampling the training set to address the issue of imbalanced data. Lastly, the third set combines oversampling with recursive feature selection to enhance the model's performance further.
The results demonstrate that the Adaptive Boost classifier, implemented with oversampling and recursive feature selection, outperforms the other 14 techniques. It achieves the highest rank in all three evaluation metrics: recall (0.841), f1-score (0.655), and roc_auc (0.793), further indicating that the proposed approach effectively predicts churn and provides valuable insights into customer behavior.
|
10 |
A Machine Learning Ensemble Approach to Churn Prediction : Developing and Comparing Local Explanation Models on Top of a Black-Box Classifier / Maskininlärningsensembler som verktyg för prediktering av utträde : En studie i att beräkna och jämföra lokala förklaringsmodeller ovanpå svårförståeliga klassificerareOlofsson, Nina January 2017 (has links)
Churn prediction methods are widely used in Customer Relationship Management and have proven to be valuable for retaining customers. To obtain a high predictive performance, recent studies rely on increasingly complex machine learning methods, such as ensemble or hybrid models. However, the more complex a model is, the more difficult it becomes to understand how decisions are actually made. Previous studies on machine learning interpretability have used a global perspective for understanding black-box models. This study explores the use of local explanation models for explaining the individual predictions of a Random Forest ensemble model. The churn prediction was studied on the users of Tink – a finance app. This thesis aims to take local explanations one step further by making comparisons between churn indicators of different user groups. Three sets of groups were created based on differences in three user features. The importance scores of all globally found churn indicators were then computed for each group with the help of local explanation models. The results showed that the groups did not have any significant differences regarding the globally most important churn indicators. Instead, differences were found for globally less important churn indicators, concerning the type of information that users stored in the app. In addition to comparing churn indicators between user groups, the result of this study was a well-performing Random Forest ensemble model with the ability of explaining the reason behind churn predictions for individual users. The model proved to be significantly better than a number of simpler models, with an average AUC of 0.93. / Metoder för att prediktera utträde är vanliga inom Customer Relationship Management och har visat sig vara värdefulla när det kommer till att behålla kunder. För att kunna prediktera utträde med så hög säkerhet som möjligt har den senasteforskningen fokuserat på alltmer komplexa maskininlärningsmodeller, såsom ensembler och hybridmodeller. En konsekvens av att ha alltmer komplexa modellerär dock att det blir svårare och svårare att förstå hur en viss modell har kommitfram till ett visst beslut. Tidigare studier inom maskininlärningsinterpretering har haft ett globalt perspektiv för att förklara svårförståeliga modeller. Denna studieutforskar lokala förklaringsmodeller för att förklara individuella beslut av en ensemblemodell känd som 'Random Forest'. Prediktionen av utträde studeras påanvändarna av Tink – en finansapp. Syftet med denna studie är att ta lokala förklaringsmodeller ett steg längre genomatt göra jämförelser av indikatorer för utträde mellan olika användargrupper. Totalt undersöktes tre par av grupper som påvisade skillnader i tre olika variabler. Sedan användes lokala förklaringsmodeller till att beräkna hur viktiga alla globaltfunna indikatorer för utträde var för respektive grupp. Resultaten visade att detinte fanns några signifikanta skillnader mellan grupperna gällande huvudindikatorerna för utträde. Istället visade resultaten skillnader i mindre viktiga indikatorer som hade att göra med den typ av information som lagras av användarna i appen. Förutom att undersöka skillnader i indikatorer för utträde resulterade dennastudie i en välfungerande modell för att prediktera utträde med förmågan attförklara individuella beslut. Random Forest-modellen visade sig vara signifikantbättre än ett antal enklare modeller, med ett AUC-värde på 0.93.
|
Page generated in 0.0687 seconds