Global ETD Search

261	Anomaly Detection for Network Traffic in a Resource Constrained Environment Lidholm, Pontus, Ingletto, Gaia January 2023 (has links) Networks connected to the internet are under a constant threat of attacks. To protect against such threats, new techniques utilising already connected hardware have in this thesis been proven to be a viable solution. By equipping network switches with lightweight machine learning models, such as, Decision Tree and Random Forest, no additional devices are needed to be installed on the network.When an attack is detected, the device may notify or take direct actions on the network to protect vulnerable systems. By utilising container software on Westermo's devices, a model has been integrated, limiting its computational resources. Such a system, and its building blocks, are what this thesis has researched and implemented. The system has been validated using multiple different models using a range of parameters.These models have been trained offline on datasets with pre-recorded attacks. The recordings are converted into flows, decreasing dataset size and increasing information density. These flows contain features corresponding to information about the packets and statistics about the flows. During training, a subset of features was selected using a Genetic Algorithm, decreasing the time for processing each packet. After the models have been trained, they are converted to C code, which runs on a network switch. These models are verified online, using a simulated factory, launching different attacks on the network. Results show that the hardware is sufficient for smaller models and that the system is capable of detecting certain types of attacks. Network Traffic Anomaly Detection Embedded Systems Machine Learning Random Forest Embedded Systems Inbäddad systemteknik Communication Systems Kommunikationssystem Computer Systems Datorsystem
262	Classification of weather conditions based on supervised learning Safia, Mohamad, Abbas, Rodi January 2023 (has links) Forecasting the weather remains a challenging task because of the atmosphere's complexity and unpredictable nature. A few of the factors that decide weather conditions, such as rain, clouds, clear skies, and sunshine, include temperature, pressure, humidity, wind speed, and direction. Currently, sophisticated, and physical models are used to forecast weather, but they have several limitations, particularly in terms of computational time. In the past few years, supervised machine learning algorithms have shown great promise for the precise forecasting of meteorological events. Using historical weather data, these strategies train a model to predict the weather in the future. This study employs supervised machine learning techniques, including k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs), for better weather forecast accuracy. To conduct this study, we employed historical weather data from the Weatherstack API. The data spans several years and contains information on several meteorological variables, including temperature, pressure, humidity, wind speed, and direction. The data is processed beforehand which includes normalizing it and dividing it into separate training and testing sets. Finally, the effectiveness of different models is examined to determine which is best for producing accurate weather forecasts. The results of this study provide information on the application of supervised machine learning methods for weather forecasting and support the creation of better weather prediction models. / Att förutsäga vädret är fortfarande en utmanande uppgift på grund av atmosfärens komplexitet och oförutsägbara natur. Några av faktorerna som påverkar väderförhållandena, som regn, moln, klart väder och solsken, inkluderar temperatur, tryck, luftfuktighet, vindhastighet och riktning. För närvarande används sofistikerade fysiska modeller för att förutsäga vädret, men de har flera begränsningar, särskilt när det gäller beräkningstid. Under de senaste åren har övervakade maskininlärningsalgoritmer visat stor potential för att noggrant förutsäga meteorologiska händelser. Genom att använda historiska väderdata tränar dessa strategier en modell för att förutsäga framtida väder. Denna studie använder övervakade maskininlärningstekniker, inklusive k-nearest neighbors (KNNs), support vector machines (SVMs), random forests (RFs) och artificial neural networks (ANNs), för att förbättra noggrannheten i väderprognoser. För att genomföra denna studie använde vi historiska väderdata från Weatherstack API. Data sträcker sig över flera år och innehåller information om flera meteorologiska variabler, inklusive temperatur, tryck, luftfuktighet, vindhastighet och riktning. Data bearbetas i förväg, vilket inkluderar normalisering och uppdelning i separata tränings- och testset. Slutligen undersöks effektiviteten hos olika modeller för att avgöra vilken som är bäst för att producera noggranna väderprognoser. Resultaten av denna studie ger information om tillämpningen av övervakade maskininlärningsmetoder för väderprognoser och stödjer skapandet av bättre väderprognosmodeller. Machine learning Neural networks Support vector machines K-nearest neighbours Random forest Weather prediction Computer Sciences Datavetenskap (datalogi)
263	Network Interconnectivity Prediction from SCADA System Data : A Case Study in the Wastewater Industry / Prediktion av Nätverkssammankoppling från Data Genererat av SCADA System : En fallstudie inom avloppsindustrin Isacson, Jonas January 2019 (has links) Increased strain on incumbent wastewater distribution networks originating from population increases as well as climate change calls for enhanced resource utilization. Accurately being able to predict network interconnectivity is vital within the wastewater industry to enable operational management strategies that optimizes the performance of the wastewater system. In this thesis, an evaluation of the network interconnectivity prediction performance of two machine learning models, the multilayer perceptron (MLP) and the support vector machine (SVM), utilizing supervisory control and dataacquisition (SCADA) system data for a wastewater system is presented. Results of the thesis imply that the MLP achieves the best predictions of the network interconnectivity. The thesis concludes that the MLP is the superior model and that the highest achievable network interconnectivity accuracy is 56% which is attained by the MLP model. / Den ökade påfrestningen på nuvarande avloppsnät till följd av befolkningstillväxt och klimatförändringar medför att det finns behov för optimerad resursförbrukning. Att korrekt kunna predicera ett avloppsnät är önskvärt då det möjliggör för effektivitetshöjande operativ förvaltning av avloppssystemet. I denna avhandling evalueras hur väl två maskininlärningsmodeller kan predicera nätverketssammankoppling med data från ett system för övervakning och kontroll av data (SCADA) genererat av ett avloppsnätverk. De två modellerna som testas är en multilagersperceptron (MLP) och en stödvektormaskin (SVM). Resultaten av avhandlingen visar på att MLP modellen uppnår den bästa prediktionen av nätverketssammankoppling. Avhandlingen konkluderar att MLP modellen är den bästa modellen för att predicera nätverkets sammankoppling samt att den högsta nåbara korrektheten var 56% vilket uppnåddes av MLP modellen. MLP SVM IoT Binary Classification Random Forest Network Predicition Wastewater Distrubtion Network SCADA Industry 4.0 Engineering and Technology Teknik och teknologier
264	Evaluating Random Forest and a Long Short-Term Memory in Classifying a Given Sentence as a Question or Non-Question Ankaräng, Fredrik, Waldner, Fabian January 2019 (has links) Natural language processing and text classification are topics of much discussion among researchers of machine learning. Contributions in the form of new methods and models are presented on a yearly basis. However, less focus is aimed at comparing models, especially comparing models that are less complex to state-of-the-art models. This paper compares a Random Forest with a Long-Short Term Memory neural network for the task of classifying sentences as questions or non-questions, without considering punctuation. The models were trained and optimized on chat data from a Swedish insurance company, as well as user comments data on articles from a newspaper. The results showed that the LSTM model performed better than the Random Forest. However, the difference was small and therefore Random Forest could still be a preferable alternative in some use cases due to its simplicity and its ability to handle noisy data. The models’ performances were not dramatically improved after hyper parameter optimization. A literature study was also conducted aimed at exploring how customer service can be automated using a chatbot and what features and functionality should be prioritized by management during such an implementation. The findings of the study showed that a data driven design should be used, where features are derived based on the specific needs and customers of the organization. However, three features were general enough to be presented the personality of the bot, its trustworthiness and in what stage of the value chain the chatbot is implemented. / Språkteknologi och textklassificering är vetenskapliga områden som tillägnats mycket uppmärksamhet av forskare inom maskininlärning. Nya metoder och modeller presenteras årligen, men mindre fokus riktas på att jämföra modeller av olika karaktär. Den här uppsatsen jämför Random Forest med ett Long Short-Term Memory neuralt nätverk genom att undersöka hur väl modellerna klassificerar meningar som frågor eller icke-frågor, utan att ta hänsyn till skiljetecken. Modellerna tränades och optimerades på användardata från ett svenskt försäkringsbolag, samt kommentarer från nyhetsartiklar. Resultaten visade att LSTM-modellen presterade bättre än Random Forest. Skillnaden var dock liten, vilket innebär att Random Forest fortfarande kan vara ett bättre alternativ i vissa situationer tack vare dess enkelhet. Modellernas prestanda förbättrades inte avsevärt efter hyperparameteroptimering. En litteraturstudie genomfördes även med målsättning att undersöka hur arbetsuppgifter inom kundsupport kan automatiseras genom införandet av en chatbot, samt vilka funktioner som bör prioriteras av ledningen inför en sådan implementation. Resultaten av studien visade att en data-driven approach var att föredra, där funktionaliteten bestämdes av användarnas och organisationens specifika behov. Tre funktioner var dock tillräckligt generella för att presenteras personligheten av chatboten, dess trovärdighet och i vilket steg av värdekedjan den implementeras. Bag-of-Words Chatbot Classification LSTM Machine Learning Natural Language Processing Random Forest Word2Vec Computer and Information Sciences Data- och informationsvetenskap
265	Predicting Risk Level in Life Insurance Application : Comparing Accuracy of Logistic Regression, DecisionTree, Random Forest and Linear Support VectorClassifiers Karthik Reddy, Pulagam, Veerababu, Sutapalli January 2023 (has links) Background: Over the last decade, there has been a significant rise in the life insurance industry. Every life insurance application is associated with some level ofrisk, which determines the premium they charge. The process of evaluating this levelof risk for a life insurance application is time-consuming. In the present scenario, it is hard for the insurance industry to process millions of life insurance applications.One potential approach is to involve machine learning to establish a framework forevaluating the level of risk associated with a life insurance application. Objectives: The aim of this thesis is to perform two comparison studies. The firststudy aims to compare the accuracy of the logistic regression classifier, decision tree classifier, random forest classifier and linear support vector classifier for evaluatingthe level of risk associated with a life insurance application. The second study aimsto identify the impact of changes in the dataset over the accuracy of these selected classification models. Methods: The chosen approach was an experimentation methodology to attain theaim of the thesis and address its research questions. The experimentation involvedcomparing four ML algorithms, namely the LRC, DTC, RFC and Linear SVC. These algorithms were trained, validated and tested on two datasets. A new dataset wascreated by replacing the "BMI" variable with the "Life Expectancy" variable. Thefour selected ML algorithms were compared based on their performance metrics,which included accuracy, precision, recall and f1-score. Results: Among the four selected machine learning algorithms, random forest classifier attained higher accuracy with 53.79% and 52.80% on unmodified and modifieddatasets respectively. Hence, it was the most accurate algorithm for predicting risklevel in life insurance application. The second best algorithm was decision tree classifier with 51.12% and 50.79% on unmodified and modified datasets. The selectedmodels attained higher accuracies when they are trained, validated and tested withunmodified dataset. Conclusions: The random forest classifier scored high accuracy among the fourselected algorithms on both unmodified dataset and modified datasets. The selected models attained higher accuracies when they are trained, validated and tested with unmodified compared to modified dataset. Therefore, the unmodified dataset is more suitable for predicting risk level in life insurance application. Decision Tree Classifier Logistic Regression Machine Learning Random Forest Classifier Linear Support Vector Classifier Computer Sciences Datavetenskap (datalogi)
266	Probability of Default Machine Learning Modeling : A Stress Testing Evaluation Andersson, Tobias, Mentes, Mattias January 2023 (has links) This thesis aims to assist in the development of machine learning models tailored for stress testing. The main objective is to create models that can predict loan defaults while considering the impact of macroeconomic stress. By achieving this, Nordea can continue the development of machine learning models for stress testing by utilizing the models as a basis for further advancement. The research begins with an analysis of historical loan data, encompassing diverse customer and macroeconomic variables that influence loan default rates. Leveraging machine learning algorithms, feature selection methods, data imbalance management and model training techniques, a set of predictive models is constructed. These models aim to capture the intricate relationships between the identified variables and loan defaults, ensuring their suitability for stress testing purposes. The subsequent phase of the research focuses on subjecting the developed models to simulated adverse economic conditions during stress testing. By evaluating the models’ performance under various stressed scenarios, their ability to provide predictions is assessed. This stress testing process allows us to analyse the models’ capabilities of incorporating a stressed scenario in their predictions. The thesis concludes with an evaluation of the developed machine learning models and their abilities to identify defaulted loans in a stressed macroeconomy. By creating these models specifically tailored for stress testing loans, we will provide a basis for further development within the area of stress testing modeling. / Denna uppsats syftar till att bidra till utvecklingen av maskininlärningsmodeller lämpade för stress testing. Det främsta målet är att skapa modeller som kan förutsäga lån som kommer att misslyckas samtidigt som de beaktar påverkan av makroekonomisk stress. Genom att uppnå detta kan Nordea fortsätta utvecklingen av maskininlärningsmodeller för stress testning genom att använda modellerna som grund för ytterligare utveckling. Arbetet inleds med en analys av historisk lånedata, som omfattar olika kund- och makroekonomiska variabler som påverkar lån. Genom att använda oss av maskininlärningsalgoritmer, metoder för urval av förklarande variabler, hantering av dataobalans och tekniker för modellträning konstrueras en uppsättning prediktiva modeller. Dessa modeller syftar till att fånga de komplexa relationerna mellan de identifierade variablerna och låneavvikelser och säkerställa deras lämplighet för stress testning. Den efterföljande fasen av arbetet fokuserar på att utsätta de utvecklade modellerna för simulerade stressade ekonomiska förhållanden. Genom att utvärdera modellernas prestanda under olika stressade förhållanden bedöms deras förmåga att prediktera uteblivna lån. Denna process för stress testning gör det möjligt för oss att analysera modellernas förmåga att inkludera stressade förhållanden i sina prediktioner. Uppsatsen avslutas med en utvärdering av de utvecklade maskininlärningsmodellerna och deras förmåga att identifiera uteblivna lån i en stressad makroekonomi. Genom att skapa dessa modeller specifikt anpassade för stresstestning av lån kommer vi att ge en grund för ytterligare utveckling inom området. Probability of Default Machine Learning Stress Testing Logistic Regression Decision Tree Random Forest Artificial Neural Network Mathematics Matematik
267	Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning Algorithms Ahlqvist, Oskar January 2023 (has links) Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection. Fraud detection Machine learning Random Forest support vector machine Logistic Regression Classification models Affiliate marketing Computer Sciences Datavetenskap (datalogi)
268	What Matters the Most? Understanding Individual Tornado Preparedness Using Machine Learning Choi, Junghwa, Robinson, Scott, Maulik, Romit, Wehde, Wesley 01 August 2020 (has links) Scholars from various disciplines have long attempted to identify the variables most closely associated with individual preparedness. Therefore, we now have much more knowledge regarding these factors and their association with individual preparedness behaviors. However, it has not been sufficiently discussed how decisive many of these factors are in encouraging preparedness. In this article, we seek to examine what factors, among the many examined in previous studies, are most central to engendering emergency preparedness in individuals particularly for tornadoes by utilizing a relatively uncommon machine learning technique in disaster management literature. Using unique survey data, we find that in the case of tornado preparedness the most decisive variables are related to personal experiences and economic circumstances rather than basic demographics. Our findings contribute to scholarly endeavors to understand and promote individual tornado preparedness behaviors by highlighting the variables most likely to shape tornado preparedness at an individual level. disaster management emergency preparedness machine learning random forest regression tornado preparedness
269	A Machine Learning approach to churn prediction in a subscription-based service / Användning av maskininlärning för att förutspå churn för en prenumerationsbaserad produkt Blank, Clas, Hermansson, Tomas January 2018 (has links) Prenumerationstjänster blir alltmer populära i dagens samhälle. En av nycklarna för att lyckas med en prenumerationsbaserad affärsmodell är att minimera kundbortfall (eng. churn), dvs. kunder som avslutar sin prenumeration inom en viss tidsperiod. I och med den ökande digitaliseringen, är det nu enklare att samla in data än någonsin tidigare. Samtidigt växer maskininlärning snabbt och blir alltmer lättillgängligt, vilket möjliggör nya infallsvinklar på problemlösning. Denna rapport kommer testa och utvärdera ett försök att förutsäga kundbortfall med hjälp av maskininlärning, baserat på kunddata från ett företag med en prenumerationsbaserad affärsmodell där prenumeranten får besöka live-event till en fast månadskostnad. De maskininlärningsmodeller som användes i testerna var Random Forests, Support Vector Machines, Logistic Regression, och Neural Networks som alla tränades med användardata från företaget. Modellerna gav ett slutligt träffsäkerhetsresultat i spannet mellan 73,7 % och 76,7 %. Därutöver tenderade modellerna att ge ett högre resultat för precision och täckning gällande att klassificera kunder som sagt upp sin prenumeration än för de som fortfarande var aktiva. Dessutom kunde det konstateras att de kundegenskaper som hade störst inverkan på klassifikationen var ”Använda Biljetter” och ”Längd på Prenumeration”. Slutligen kommer det i denna rapport diskuteras hur informationen angående vilka kunder som sannolikt kommer avsluta sin prenumeration kan användas ur ett mer affärsmässigt perspektiv. / In today’s world subscription-based online services are becoming increasingly popular. One of the keys to success in a subscription-based business model is to minimize churn, i.e. customer canceling their subscriptions. Due to the digitalization of the world, data is easier to collect than ever before. At the same time machine learning is growing and is made more available. That opens up new possibilities to solve different problems with the use of machine learning. This paper will test and evaluate a machine learning approach to churn prediction, based on the user data from a company with an online subscription service letting the user attend live shows to a fixed price. To perform the tests different machine learning models were used, both individually and combined. The models were Random Forests, Support Vector Machines, Logistic Regression and Neural Networks. In order to train them a data set containing either active or churned users was provided. Eventually the models returned accuracy results ranging from 73.7 % to 76.7 % when classifying churners based on their activity data. Furthermore, the models turned out to have higher scores for precision and recall for classifying the churners than the non-churners. In addition, the features that had the most impact on the model regarding the classification were Tickets Used and Length of Subscription. Moreover, this paper will discuss how churn prediction can be used from a business perspective. machine learning churn subscription random forest SVM neural network logistic regression gini impurity features Computer and Information Sciences Data- och informationsvetenskap
270	Using Machine Learning to Predict Employee Resignation in the Swedish Armed Forces Foley, Amanda January 2019 (has links) Since the Swedish government reinstated conscription in 2017, the Swedish Armed Forces are once again able to meet the wartime staffing requirements. In addition to the increase in employees the Swedish Armed Forces have been able to shift focus from external recruiting to internal human resource management. High employee turnover is a costly affair, especially in an organization like this one, where the initial investments, by way of training, are expensive and arduous. Predicting which employees are about to resign can help retain employees and decrease turnover and in turn save resources. With sufficient data, machine learning can be used to predict which employees are about to resign. This study shows that the machine learning model, random forest, can increase accuracy and precision of predictions, and points to variables and behavioral indicators that have been found to have a strong correlation to employee resignation. / Detta arbete utforskar möjligheten att använda maskininlärning, mer specifikt modellen random forest, för att förutspå uppsägning av anställda i Försvarsmakten. Arbetet stammar ur återinförandet av värnplikten i 2017, som följd av att enbart ca. 60% av bemanningskravet i krigstid med den frivilliga modellen kunde mötas. Arbetet finner att maskininlärningsmodellen random forest, kan användas för att förutspå uppsägningar till en icke-trivial grad. Random forestmodellen kan användas till att förutspå uppsägningar till 89% noggrannhet och 72% precision. Den största källan till osäkerhet i studien är mängden och egenskaperna hos datan. Studien är baserad på data från 1500 heltidsanställda gruppchefer, soldater och sjömän (GSS-K). För att förbättra resultatet och i synnerhet precisionen behövs mer data och data med en starkare korrelation till beteende. För framtida studier rekommenderas att utforska huruvida andra maskininlärningsmodeller är lämpade för just denna verksamhet, men även hur arbete, insamling och förvaltning av data inom Försvarsmakten kan utvecklas. conscription machine learning employee turnover employee retention random forest Swedish Armed Forces Computer and Information Sciences Data- och informationsvetenskap

Search results