• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 507
  • 507
  • 272
  • 270
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Evaluating Random Forest and a Long Short-Term Memory in Classifying a Given Sentence as a Question or Non-Question

Ankaräng, Fredrik, Waldner, Fabian January 2019 (has links)
Natural language processing and text classification are topics of much discussion among researchers of machine learning. Contributions in the form of new methods and models are presented on a yearly basis. However, less focus is aimed at comparing models, especially comparing models that are less complex to state-of-the-art models. This paper compares a Random Forest with a Long-Short Term Memory neural network for the task of classifying sentences as questions or non-questions, without considering punctuation. The models were trained and optimized on chat data from a Swedish insurance company, as well as user comments data on articles from a newspaper. The results showed that the LSTM model performed better than the Random Forest. However, the difference was small and therefore Random Forest could still be a preferable alternative in some use cases due to its simplicity and its ability to handle noisy data. The models’ performances were not dramatically improved after hyper parameter optimization. A literature study was also conducted aimed at exploring how customer service can be automated using a chatbot and what features and functionality should be prioritized by management during such an implementation. The findings of the study showed that a data driven design should be used, where features are derived based on the specific needs and customers of the organization. However, three features were general enough to be presented the personality of the bot, its trustworthiness and in what stage of the value chain the chatbot is implemented. / Språkteknologi och textklassificering är vetenskapliga områden som tillägnats mycket uppmärksamhet av forskare inom maskininlärning. Nya metoder och modeller presenteras årligen, men mindre fokus riktas på att jämföra modeller av olika karaktär. Den här uppsatsen jämför Random Forest med ett Long Short-Term Memory neuralt nätverk genom att undersöka hur väl modellerna klassificerar meningar som frågor eller icke-frågor, utan att ta hänsyn till skiljetecken. Modellerna tränades och optimerades på användardata från ett svenskt försäkringsbolag, samt kommentarer från nyhetsartiklar. Resultaten visade att LSTM-modellen presterade bättre än Random Forest. Skillnaden var dock liten, vilket innebär att Random Forest fortfarande kan vara ett bättre alternativ i vissa situationer tack vare dess enkelhet. Modellernas prestanda förbättrades inte avsevärt efter hyperparameteroptimering. En litteraturstudie genomfördes även med målsättning att undersöka hur arbetsuppgifter inom kundsupport kan automatiseras genom införandet av en chatbot, samt vilka funktioner som bör prioriteras av ledningen inför en sådan implementation. Resultaten av studien visade att en data-driven approach var att föredra, där funktionaliteten bestämdes av användarnas och organisationens specifika behov. Tre funktioner var dock tillräckligt generella för att presenteras personligheten av chatboten, dess trovärdighet och i vilket steg av värdekedjan den implementeras.
262

Predicting Risk Level in Life Insurance Application : Comparing Accuracy of Logistic Regression, DecisionTree, Random Forest and Linear Support VectorClassifiers

Karthik Reddy, Pulagam, Veerababu, Sutapalli January 2023 (has links)
Background: Over the last decade, there has been a significant rise in the life insurance industry. Every life insurance application is associated with some level ofrisk, which determines the premium they charge. The process of evaluating this levelof risk for a life insurance application is time-consuming. In the present scenario, it is hard for the insurance industry to process millions of life insurance applications.One potential approach is to involve machine learning to establish a framework forevaluating the level of risk associated with a life insurance application. Objectives: The aim of this thesis is to perform two comparison studies. The firststudy aims to compare the accuracy of the logistic regression classifier, decision tree classifier, random forest classifier and linear support vector classifier for evaluatingthe level of risk associated with a life insurance application. The second study aimsto identify the impact of changes in the dataset over the accuracy of these selected classification models. Methods: The chosen approach was an experimentation methodology to attain theaim of the thesis and address its research questions. The experimentation involvedcomparing four ML algorithms, namely the LRC, DTC, RFC and Linear SVC. These algorithms were trained, validated and tested on two datasets. A new dataset wascreated by replacing the "BMI" variable with the "Life Expectancy" variable. Thefour selected ML algorithms were compared based on their performance metrics,which included accuracy, precision, recall and f1-score. Results: Among the four selected machine learning algorithms, random forest classifier attained higher accuracy with 53.79% and 52.80% on unmodified and modifieddatasets respectively. Hence, it was the most accurate algorithm for predicting risklevel in life insurance application. The second best algorithm was decision tree classifier with 51.12% and 50.79% on unmodified and modified datasets. The selectedmodels attained higher accuracies when they are trained, validated and tested withunmodified dataset. Conclusions: The random forest classifier scored high accuracy among the fourselected algorithms on both unmodified dataset and modified datasets. The selected models attained higher accuracies when they are trained, validated and tested with unmodified compared to modified dataset. Therefore, the unmodified dataset is more suitable for predicting risk level in life insurance application.
263

Probability of Default Machine Learning Modeling : A Stress Testing Evaluation

Andersson, Tobias, Mentes, Mattias January 2023 (has links)
This thesis aims to assist in the development of machine learning models tailored for stress testing. The main objective is to create models that can predict loan defaults while considering the impact of macroeconomic stress. By achieving this, Nordea can continue the development of machine learning models for stress testing by utilizing the models as a basis for further advancement. The research begins with an analysis of historical loan data, encompassing diverse customer and macroeconomic variables that influence loan default rates. Leveraging machine learning algorithms, feature selection methods, data imbalance management and model training techniques, a set of predictive models is constructed. These models aim to capture the intricate relationships between the identified variables and loan defaults, ensuring their suitability for stress testing purposes. The subsequent phase of the research focuses on subjecting the developed models to simulated adverse economic conditions during stress testing. By evaluating the models’ performance under various stressed scenarios, their ability to provide predictions is assessed. This stress testing process allows us to analyse the models’ capabilities of incorporating a stressed scenario in their predictions. The thesis concludes with an evaluation of the developed machine learning models and their abilities to identify defaulted loans in a stressed macroeconomy. By creating these models specifically tailored for stress testing loans, we will provide a basis for further development within the area of stress testing modeling. / Denna uppsats syftar till att bidra till utvecklingen av maskininlärningsmodeller lämpade för stress testing. Det främsta målet är att skapa modeller som kan förutsäga lån som kommer att misslyckas samtidigt som de beaktar påverkan av makroekonomisk stress. Genom att uppnå detta kan Nordea fortsätta utvecklingen av maskininlärningsmodeller för stress testning genom att använda modellerna som grund för ytterligare utveckling. Arbetet inleds med en analys av historisk lånedata, som omfattar olika kund- och makroekonomiska variabler som påverkar lån. Genom att använda oss av maskininlärningsalgoritmer, metoder för urval av förklarande variabler, hantering av dataobalans och tekniker för modellträning konstrueras en uppsättning prediktiva modeller. Dessa modeller syftar till att fånga de komplexa relationerna mellan de identifierade variablerna och låneavvikelser och säkerställa deras lämplighet för stress testning. Den efterföljande fasen av arbetet fokuserar på att utsätta de utvecklade modellerna för simulerade stressade ekonomiska förhållanden. Genom att utvärdera modellernas prestanda under olika stressade förhållanden bedöms deras förmåga att prediktera uteblivna lån. Denna process för stress testning gör det möjligt för oss att analysera modellernas förmåga att inkludera stressade förhållanden i sina prediktioner. Uppsatsen avslutas med en utvärdering av de utvecklade maskininlärningsmodellerna och deras förmåga att identifiera uteblivna lån i en stressad makroekonomi. Genom att skapa dessa modeller specifikt anpassade för stresstestning av lån kommer vi att ge en grund för ytterligare utveckling inom området.
264

Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning Algorithms

Ahlqvist, Oskar January 2023 (has links)
Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection.
265

What Matters the Most? Understanding Individual Tornado Preparedness Using Machine Learning

Choi, Junghwa, Robinson, Scott, Maulik, Romit, Wehde, Wesley 01 August 2020 (has links)
Scholars from various disciplines have long attempted to identify the variables most closely associated with individual preparedness. Therefore, we now have much more knowledge regarding these factors and their association with individual preparedness behaviors. However, it has not been sufficiently discussed how decisive many of these factors are in encouraging preparedness. In this article, we seek to examine what factors, among the many examined in previous studies, are most central to engendering emergency preparedness in individuals particularly for tornadoes by utilizing a relatively uncommon machine learning technique in disaster management literature. Using unique survey data, we find that in the case of tornado preparedness the most decisive variables are related to personal experiences and economic circumstances rather than basic demographics. Our findings contribute to scholarly endeavors to understand and promote individual tornado preparedness behaviors by highlighting the variables most likely to shape tornado preparedness at an individual level.
266

A Machine Learning approach to churn prediction in a subscription-based service / Användning av maskininlärning för att förutspå churn för en prenumerationsbaserad produkt

Blank, Clas, Hermansson, Tomas January 2018 (has links)
Prenumerationstjänster blir alltmer populära i dagens samhälle. En av nycklarna för att lyckas med en prenumerationsbaserad affärsmodell är att minimera kundbortfall (eng. churn), dvs. kunder som avslutar sin prenumeration inom en viss tidsperiod. I och med den ökande digitaliseringen, är det nu enklare att samla in data än någonsin tidigare. Samtidigt växer maskininlärning snabbt och blir alltmer lättillgängligt, vilket möjliggör nya infallsvinklar på problemlösning. Denna rapport kommer testa och utvärdera ett försök att förutsäga kundbortfall med hjälp av maskininlärning, baserat på kunddata från ett företag med en prenumerationsbaserad affärsmodell där prenumeranten får besöka live-event till en fast månadskostnad. De maskininlärningsmodeller som användes i testerna var Random Forests, Support Vector Machines, Logistic Regression, och Neural Networks som alla tränades med användardata från företaget. Modellerna gav ett slutligt träffsäkerhetsresultat i spannet mellan 73,7 % och 76,7 %. Därutöver tenderade modellerna att ge ett högre resultat för precision och täckning gällande att klassificera kunder som sagt upp sin prenumeration än för de som fortfarande var aktiva. Dessutom kunde det konstateras att de kundegenskaper som hade störst inverkan på klassifikationen var ”Använda Biljetter” och ”Längd på Prenumeration”. Slutligen kommer det i denna rapport diskuteras hur informationen angående vilka kunder som sannolikt kommer avsluta sin prenumeration kan användas ur ett mer affärsmässigt perspektiv. / In today’s world subscription-based online services are becoming increasingly popular. One of the keys to success in a subscription-based business model is to minimize churn, i.e. customer canceling their subscriptions. Due to the digitalization of the world, data is easier to collect than ever before. At the same time machine learning is growing and is made more available. That opens up new possibilities to solve different problems with the use of machine learning. This paper will test and evaluate a machine learning approach to churn prediction, based on the user data from a company with an online subscription service letting the user attend live shows to a fixed price. To perform the tests different machine learning models were used, both individually and combined. The models were Random Forests, Support Vector Machines, Logistic Regression and Neural Networks. In order to train them a data set containing either active or churned users was provided. Eventually the models returned accuracy results ranging from 73.7 % to 76.7 % when classifying churners based on their activity data. Furthermore, the models turned out to have higher scores for precision and recall for classifying the churners than the non-churners. In addition, the features that had the most impact on the model regarding the classification were Tickets Used and Length of Subscription. Moreover, this paper will discuss how churn prediction can be used from a business perspective.
267

Using Machine Learning to Predict Employee Resignation in the Swedish Armed Forces

Foley, Amanda January 2019 (has links)
Since the Swedish government reinstated conscription in 2017, the Swedish Armed Forces are once again able to meet the wartime staffing requirements. In addition to the increase in employees the Swedish Armed Forces have been able to shift focus from external recruiting to internal human resource management. High employee turnover is a costly affair, especially in an organization like this one, where the initial investments, by way of training, are expensive and arduous. Predicting which employees are about to resign can help retain employees and decrease turnover and in turn save resources. With sufficient data, machine learning can be used to predict which employees are about to resign. This study shows that the machine learning model, random forest, can increase accuracy and precision of predictions, and points to variables and behavioral indicators that have been found to have a strong correlation to employee resignation. / Detta arbete utforskar möjligheten att använda maskininlärning, mer specifikt modellen random forest, för att förutspå uppsägning av anställda i Försvarsmakten. Arbetet stammar ur återinförandet av värnplikten i 2017, som följd av att enbart ca. 60% av bemanningskravet i krigstid med den frivilliga modellen kunde mötas. Arbetet finner att maskininlärningsmodellen random forest, kan användas för att förutspå uppsägningar till en icke-trivial grad. Random forestmodellen kan användas till att förutspå uppsägningar till 89% noggrannhet och 72% precision. Den största källan till osäkerhet i studien är mängden och egenskaperna hos datan. Studien är baserad på data från 1500 heltidsanställda gruppchefer, soldater och sjömän (GSS-K). För att förbättra resultatet och i synnerhet precisionen behövs mer data och data med en starkare korrelation till beteende. För framtida studier rekommenderas att utforska huruvida andra maskininlärningsmodeller är lämpade för just denna verksamhet, men även hur arbete, insamling och förvaltning av data inom Försvarsmakten kan utvecklas.
268

Prediktion av efterfrågan i filmbranschen baserat på maskininlärning

Liu, Julia, Lindahl, Linnéa January 2018 (has links)
Machine learning is a central technology in data-driven decision making. In this study, machine learning in the context of demand forecasting in the motion picture industry from film exhibitors’ perspective is investigated. More specifically, it is investigated to what extent the technology can assist estimation of public interest in terms of revenue levels of unreleased movies. Three machine learning models are implemented with the aim to forecast cumulative revenue levels during the opening weekend of various movies which were released in 2010-2017 in Sweden. The forecast is based on ten attributes which range from public online user-generated data to specific movie characteristics such as production budget and cast. The results indicate that the choice of attributes as well as models in this study were not optimal on the Swedish market as the retrieved values from relevant precision metrics were inadequate, however with valid underlying reasons. / Maskininlärning är en central teknik i datadrivet beslutsfattande. I den här rapporten utreds maskininlärning isammanhanget av efterfrågeprediktion i filmbranschen från biografers perspektiv. Närmare bestämt undersöks det i vilken utsträckningtekniken kan bistå uppskattning av publikintresse i termer av intäkter vad gäller osläppta filmer hos biografer. Tremaskininlärningsmodeller implementeras i syfte att göra en prognos på kumulativa intäktsnivåer under premiärhelgen för filmer vilkahade premiär 2010-2017 i Sverige. Prognostiseringen baseras på varierande attribut som sträcker sig från publik användargenererad data på nätet till filmspecifika variabler så som produktionsbudget och uppsättning av skådespelare. De erhållna resultaten visar att valen av attribut och modeller inte var optimala på den svenska marknaden då erhållna precisionsmått från modellerna antog låga värden, med relevanta underliggande skäl.
269

Analyses Of Crash Occurence And Injury Severities On Multi Lane Highways Using Machine Learning Algorithms

Das, Abhishek 01 January 2009 (has links)
Reduction of crash occurrence on the various roadway locations (mid-block segments; signalized intersections; un-signalized intersections) and the mitigation of injury severity in the event of a crash are the major concerns of transportation safety engineers. Multi lane arterial roadways (excluding freeways and expressways) account for forty-three percent of fatal crashes in the state of Florida. Significant contributing causes fall under the broad categories of aggressive driver behavior; adverse weather and environmental conditions; and roadway geometric and traffic factors. The objective of this research was the implementation of innovative, state-of-the-art analytical methods to identify the contributing factors for crashes and injury severity. Advances in computational methods render the use of modern statistical and machine learning algorithms. Even though most of the contributing factors are known a-priori, advanced methods unearth changing trends. Heuristic evolutionary processes such as genetic programming; sophisticated data mining methods like conditional inference tree; and mathematical treatments in the form of sensitivity analyses outline the major contributions in this research. Application of traditional statistical methods like simultaneous ordered probit models, identification and resolution of crash data problems are also key aspects of this study. In order to eliminate the use of unrealistic uniform intersection influence radius of 250 ft, heuristic rules were developed for assigning crashes to roadway segments, signalized intersection and access points using parameters, such as 'site location', 'traffic control' and node information. Use of Conditional Inference Forest instead of Classification and Regression Tree to identify variables of significance for injury severity analysis removed the bias towards the selection of continuous variable or variables with large number of categories. For the injury severity analysis of crashes on highways, the corridors were clustered into four optimum groups. The optimum number of clusters was found using Partitioning around Medoids algorithm. Concepts of evolutionary biology like crossover and mutation were implemented to develop models for classification and regression analyses based on the highest hit rate and minimum error rate, respectively. Low crossover rate and higher mutation reduces the chances of genetic drift and brings in novelty to the model development process. Annual daily traffic; friction coefficient of pavements; on-street parking; curbed medians; surface and shoulder widths; alcohol / drug usage are some of the significant factors that played a role in both crash occurrence and injury severities. Relative sensitivity analyses were used to identify the effect of continuous variables on the variation of crash counts. This study improved the understanding of the significant factors that could play an important role in designing better safety countermeasures on multi lane highways, and hence enhance their safety by reducing the frequency of crashes and severity of injuries. Educating young people about the abuses of alcohol and drugs specifically at high schools and colleges could potentially lead to lower driver aggression. Removal of on-street parking from high speed arterials unilaterally could result in likely drop in the number of crashes. Widening of shoulders could give greater maneuvering space for the drivers. Improving pavement conditions for better friction coefficient will lead to improved crash recovery. Addition of lanes to alleviate problems arising out of increased ADT and restriction of trucks to the slower right lanes on the highways would not only reduce the crash occurrences but also resulted in lower injury severity levels.
270

Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology

Mistry, Pritesh, Neagu, Daniel, Trundle, Paul R., Vessey, J.D. 22 October 2015 (has links)
yes / Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.

Page generated in 0.3694 seconds