Global ETD Search

211	Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning Algorithms Ahlqvist, Oskar January 2023 (has links) Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection. Fraud detection Machine learning Random Forest support vector machine Logistic Regression Classification models Affiliate marketing Computer Sciences Datavetenskap (datalogi)
212	Predicting High-Cap Tech Stock Polarity: A Combined Approach using Support Vector Machines and Bidirectional Encoders from Transformers Grisham, Ian L 01 May 2023 (has links) (PDF) The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not contain sentiment analysis-related features. The results indicated that sentiment containing datasets were typically better predictors, with improved model accuracy. However, the results did not reflect the improvements shown by similar research and will require further research to determine the nature of the relationship between sentiment and higher model performance. Quantitative Finance Sentiment Analysis Support Vector Machine BERT Artificial Intelligence and Robotics Computational Linguistics Computer Sciences Data Science Finance Probability
213	Development of battery models for on-board health estimation in hybrid vehicles Riesco Refoyo, Javier January 2017 (has links) Following the positive reception of electric and hybrid transport solutions in the market, manufacturers keep developing their vehicles further, while facing previously undertaken challenges. Knowing the way lithium-ion batteries behave is still one of the key factors for hybrid electric vehicles (HEVs) development, especially for the requirements of the battery management system during their operation. Hence, this project focuses on the necessity of robust yet reasonably simple and cost-effective models of the battery for estimating the health status during the operation of the vehicles. With this aim, the procedure and models to calculate the state-of-health (SOH) indicators, internal resistance and capacity, are proposed and the results discussed. Two machine-learning based models are presented, a support vector machine (SVM) and a neural network (NN), together with one equivalent circuit model (ECM). The data used for training and validating the models comes from testing the batteries in the laboratory with standard performance tests and real driving cycles along the battery lifespan. However, data sets measured in actual heavy-duty vehicles during their operation for three years is also analysed and compared. With respect to this matter, a study of the battery materials, behaviour and operation attributes is carried out, highlighting the main aspects and issues that affect the development of the models. The inputs for the models are signals that can be measured on-board in the vehicles, as current, voltage or temperature, and other derived from them as the state-of-charge (SOC) calculated by the internal battery management unit. Time-series of the variables are used for simulation purposes. The management of signals and implementation of the models is done in the environment of Matlab-Simulink, using some of its in-built functions and other specifically developed. The models are evaluated and compared by means of the normalized root mean squared error (NRMSE) of the voltage output profile compared to that of the tested batteries, but also the error of the internal resistance calculations calculated from the voltage profile for the three models, and the internal parameters in case of the ECM. While despite the difficulties faced with the data, the models can eventually perform accurate estimations of the resistance, the results of the capacity estimations are omitted in the document due to the lack of useful information derived. Nevertheless, the calculation procedure and other considerations to take into account regarding the capacity estimation and data sets are undertaken. Finally, the conclusions about the data used, battery materials and methods evaluated are drawn, laying down recommendations as to design the performance tests following the conditions of the driving cycles, and indicating the higher general performance of the SVM respect the other two methods, while asserting the usefulness of the ECM. Moreover, the battery with NMC material composition is observed to be easier to predict by the models than LFP, also showing different evolution of its internal resistance. Lithium-ion battery State-of-health Resistance Capacity Materials Model Equivalent circuit model Support vector machine Neural networks Materials Engineering Materialteknik
214	Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden Freberg, Daniel January 2018 (has links) Automatically detecting anomalies in text is of great interest for surveillance entities as vast amounts of data can be analysed to find suspicious activity. In this thesis, three distinct machine learning algorithms are evaluated as a chat message classifier is being implemented for the purpose of market surveillance. Naive Bayes and Support Vector Machine belong to the statistical class of machine learning algorithms being evaluated in this thesis and both require feature selection, a side objective of the thesis is thus to find a suitable feature selection technique to ensure mentioned algorithms achieve high performance. Long Short-Term Memory network is the deep learning algorithm being evaluated in the thesis, rather than depend on feature selection, the deep neural network will be evaluated as it is trained using word embeddings. Each of the algorithms achieved high performance but the findings ofthe thesis suggest Naive Bayes algorithm in conjunction with a feature counting feature selection technique is the most suitable choice for this particular learning problem. / Att automatiskt kunna upptäcka anomalier i text har stora implikationer för företag och myndigheter som övervakar olika sorters kommunikation. I detta examensarbete utvärderas tre olika maskininlärningsalgoritmer för chattmeddelandeklassifikation i ett marknadsövervakningsystem. Naive Bayes och Support Vector Machine tillhör båda den statistiska klassen av maskininlärningsalgoritmer som utvärderas i studien och bådar kräver selektion av vilka särdrag i texten som ska användas i algoritmen. Ett sekundärt mål med studien är således att hitta en passande selektionsteknik för att de statistiska algoritmerna ska prestera så bra som möjligt. Long Short-Term Memory Network är djupinlärningsalgoritmen som utvärderas i studien. Istället för att använda en selektionsteknik kommer djupinlärningsalgoritmen nyttja ordvektorer för att representera text. Resultaten visar att alla utvärderade algoritmer kan nå hög prestanda för ändamålet, i synnerhet Naive Bayes tillsammans med termfrekvensselektion. machine learning NLP deep learning word vectors naive bayes support vector machine LSTM Computer Sciences Datavetenskap (datalogi)
215	Multitemporal Satellite Data for Monitoring Urbanization in Nanjing from 2001 to 2016 Cai, Zipan January 2017 (has links) Along with the increasing rate of urbanization takes place in the world, the population keeps shifting from rural to urban areas. China, as the country of the largest population, has the highest urban population growth in Asia, as well as the world. However, the urbanization in China, in turn, is leading to a lot of social issues which reshape the living environment and cultural fabric. A variety of these kinds of social issues emphasize the challenges regarding a healthy and sustainable urban growth particularly in the reasonable planning of urban land use and land cover features. Therefore, it is significant to establish a set of comprehensive urban sustainable development strategies to avoid detours in the urbanization process. Nowadays, faced with such as a series of the social phenomenon, the spatial and temporal technological means including Remote Sensing and Geographic Information System (GIS) can be used to help the city decision maker to make the right choices. The knowledge of land use and land cover changes in the rural and urban area assists in identifying urban growth rate and trend in both qualitative and quantitatively ways, which provides more basis for planning and designing a city in a more scientific and environmentally friendly way. This paper focuses on the urban sprawl analysis in Nanjing, Jiangsu, China that being analyzed by urban growth pattern monitoring during a study period. From 2001 to 2016, Nanjing Municipality has experienced a substantial increase in the urban area because of the growing population. In this paper, one optimal supervised classification with high accuracy which is Support Vector Machine (SVM) classifier was used to extract thematic features from multitemporal satellite data including Landsat 7 ETM+, Landsat 8, and Sentinel-2A MSI. It was interpreted to identify the existence of urban sprawl pattern based on the land use and land cover features in 2001, 2006, 2011, and 2016. Two different types of change detection analysis including post-classification comparison and change vector analysis (CVA) were performed to explore the detailed extent information of urban growth within the study region. A comparison study on these two change detection analysis methods was carried out by accuracy assessment. Based on the exploration of the change detection analysis combined with the current urban development actuality, some constructive recommendations and future research directions were given at last. By implementing the proposed methods, the urban land use and land cover changes were successfully captured. The results show there is a notable change in the urban or built-up land feature. Also, the urban area is increased by 610.98 km2 while the agricultural land area is decreased by 766.96 km2, which proved a land conversion among these land cover features in the study period. The urban area keeps growing in each particular study period while the growth rate value has a decreasing trend in the period of 2001 to 2016. Besides, both change detection techniques obtained the similar result of the distribution of urban expansion in the study area. According to the result images from two change detection methods, the expanded urban or built-up land in Nanjing distributes mainly in the surrounding area of the central city area, both side of Yangtze River, and Southwest area. The results of change detection accuracy assessment indicated the post-classification comparison has a higher overall accuracy 86.11% and a higher Kappa Coefficient 0.72 than CVA. The overall accuracy and Kappa Coefficient for CVA is 75.43% and 0.51 respectively. These results proved the strength of agreement between predicted and truth data is at ‘good’ level for post-classification comparison and ‘moderate’ for CVA. Also, the results further confirmed the expectation from previous studies that the empirical threshold determination of CVA always leads to relatively poor change detection accuracy. In general, the two change detection techniques are found to be effective and efficient in monitoring surface changes in the different class of land cover features within the study period. Nevertheless, they have their advantages and disadvantages on processing change detection analysis particularly for the topic of urban expansion. Urbanization Nanjing Remote Sensing GIS Support Vector Machine Post-Classification Comparison Change Vector Analysis Environmental Sciences Miljövetenskap
216	Non-intrusive driver drowsiness detection system. Abas, Ashardi B. January 2011 (has links) The development of technologies for preventing drowsiness at the wheel is a major challenge in the field of accident avoidance systems. Preventing drowsiness during driving requires a method for accurately detecting a decline in driver alertness and a method for alerting and refreshing the driver. As a detection method, the authors have developed a system that uses image processing technology to analyse images of the road lane with a video camera integrated with steering wheel angle data collection from a car simulation system. The main contribution of this study is a novel algorithm for drowsiness detection and tracking, which is based on the incorporation of information from a road vision system and vehicle performance parameters. Refinement of the algorithm is more precisely detected the level of drowsiness by the implementation of a support vector machine classification for robust and accurate drowsiness warning system. The Support Vector Machine (SVM) classification technique diminished drowsiness level by using non intrusive systems, using standard equipment sensors, aim to reduce these road accidents caused by drowsiness drivers. This detection system provides a non-contact technique for judging various levels of driver alertness and facilitates early detection of a decline in alertness during driving. The presented results are based on a selection of drowsiness database, which covers almost 60 hours of driving data collection measurements. All the parameters extracted from vehicle parameter data are collected in a driving simulator. With all the features from a real vehicle, a SVM drowsiness detection model is constructed. After several improvements, the classification results showed a very good indication of drowsiness by using those systems. / Title page is not included. Driving Drowsiness Accident avoidance systems Image processing Drowsiness detection Driver alertness Algorithms
217	APP DEVELOPMENT, DATA COLLECTION AND MACHINE LEARNING IN DETERMINING MEDICINE DOSAGE FOR PARKINSON'S DISEASE Olsson, Daniel, Eriksson, Jonathan, Soltani, Sedigheh January 2022 (has links) Parkinson’s disease is a neurodegenerative disorder that affects approximately 0.2% of the population having motor disabilities as its most prominent feature. A symptom of the disease is lowered dopamine levels which often is countered by oral intake of a medication called Levodopa. However, for the dopamine levels to be steady, a patient would need to regularly take the medication throughout the day. As the disease and the treatment progresses, the correct medicine prescription becomes more difficult. This project is the continuation of a previous project done by students at Uppsala University, in which a Machine Learning model with the help of Support Vector Machine could classify data collected from a handheld accelerometer as the user being either under or overdosed for Parkinson’s Disease. The goal of this project was to achieve a similar result by developing a mobile app. The mobile app was supposed to allow the user to follow a path displayed on the screen with their finger, meanwhile the app would collect touch data in the form of coordinates and timestamp these. The app development proved to be successful, and the collected data was sent to a database hosted on the Google cloud service Firebase for storage. From there, the data could be downloaded and imported to MATLAB where an SVM model was set up and trained. Once trained using data collected from healthy individuals as well as patients suffering from Parkinson’s disease, the SVM could accurately differentiate between Parkinson’s disease data and healthy data with a success rate of 91.7%. Parkinson's Disease Support Vector Machine Machine Learning APP Development React Native Elektroteknik och elektronik
218	Identifying the beginning of a kayak race using velocity signal data Kvedaraite, Indre January 2023 (has links) A kayak is a small watercraft that moves over the water. The kayak is propelled by a person sitting inside of the hull and paddling using a double-bladed paddle. While kayaking can be casual, it is used as a competitive sport in races and even the Olympic games. Therefore, it is important to be able to analyse athletes’ performance during the race. To study the race better, some kayaking teams and organizations have attached sensors to their kayaks. These sensors record various data, which is later used to generate performance reports. However, to generate such reports, the coach must manually pinpoint the beginning of the race because the sensors collect data before the actual race begins, which may include practice runs, warming-up sessions, or just standing and waiting position. The identification of the race start and the race sequence in the data is tedious and time-consuming work and could be automated. This project proposes an approach to identify kayak races from velocity signal data with the help of a machine learning algorithm. The proposed approach is a combination of several techniques: signal preprocessing, a machine learning algorithm, and a programmatic approach. Three machine learning algorithms were evaluated to detect the race sequence, which are Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and Random Forest (RF). SVM outperformed other algorithms with an accuracy of 95%. Programmatic approach was proposed to identify the start time of the race. The average error of the proposed approach is 0.24 seconds. The proposed approach was utilized in the implemented web-based application with a user interface for coaches to automatically detect the beginning of a kayak race and race signal sequence. kayak race velocity signal machine learning support vector machine k nearest neighbours random forest Computer Sciences Datavetenskap (datalogi)
219	Sentimental Analysis of CyberbullyingTweets with SVM Technique Thanikonda, Hrushikesh, Koneti, Kavya Sree January 2023 (has links) Background: Cyberbullying involves the use of digital technologies to harass, humiliate, or threaten individuals or groups. This form of bullying can occur on various platforms such as social media, messaging apps, gaming platforms, and mobile phones. With the outbreak of covid-19, there was a drastic increase in utilization of social media. And this upsurge was coupled with cyberbullying, making it a pressing issue that needs to be addressed. Sentiment analysis involves identifying and categorizing emotions and opinions expressed in text data using natural language processing and machine learning techniques. SVM is a machine learning algorithm that has been widely used for sentiment analysis due to its accuracy and efficiency. Objectives: The main objective of this study is to use SVM for sentiment analysis of cyberbullying tweets and evaluate its performance. The study aimed to determine the feasibility of using SVM for sentiment analysis and to assess its accuracy in detecting cyberbullying. Methods: The quantitative research method is used in this thesis, and data is analyzed using statistical analysis. The data set is from Kaggle and includes data about cyberbullying tweets. The collected data is preprocessed and used to train and test an SVM model. The created model will be evaluated on the test set using evaluation accuracy, precision, recall, and F1 score to determine the performance of the SVM model developed to detect cyberbullying. Results: The results showed that SVM is a suitable technique for sentiment analysis of cyberbullying tweets. The model had an accuracy of 82.3% in detecting cyberbullying, with a precision of 0.82, recall of 0.82, and F1-score of 0.83. Conclusions: The study demonstrates the feasibility of using SVM for sentimental analysis of cyberbullying tweets. The high accuracy of the SVM model suggests that it can be used to build automated systems for detecting cyberbullying. The findings highlight the importance of developing tools to detect and address cyberbullying in the online world. The use of sentimental analysis and SVM has the potential to make a significant contribution to the fight against cyberbullying. Cyberbullying tweets Dataset Data preprocessing Machine Learning Supervised Learning Support Vector Machine Validation. Computer Sciences Datavetenskap (datalogi)
220	Data mining inom tillverkningsindustrin : En fallstudie om möjligheten att förutspå kvalitetsutfall i produktionslinjer Janson, Lisa, Mathisson, Minna January 2021 (has links) I detta arbete har en fallstudie utförts på Volvo Group i Köping. I takt med ¨övergången till industri 4.0, ökar möjligheterna att använda maskininlärning som ett verktyg i analysen av industriell data och vidareutvecklingen av industriproduktionen. Detta arbete syftar till att undersöka möjligheten att förutspå kvalitetsutfall vid sammanpressning av nav och huvudaxel. Metoden innefattar implementering av tre maskininlärningsmodeller samt evaluering av dess prestation i förhållande till varandra. Vid applicering av modellerna på monteringsdata från fabriken erhölls ett bristfälligt resultat, vilket indikerar att det utifrån de inkluderade variablerna inte är möjligt att förutspå kvalitetsutfallet. Orsakerna som låg till grund för resultatet granskades, och det resulterade i att det förmodligen berodde på att modellerna var oförmögna att finna samband i datan eller att det inte fanns något samband i datasetet. För att avgöra vilken av dessa två faktorer som var avgörande skapades ett fabricerat dataset där tre nya variabler introducerades. De fabricerade värdena på dessa variabler skapades på sådant sätt att det fanns syntetisk kausalitet mellan två av variablerna och kvalitetsutfallet. Vid applicering av modellerna på den fabricerade datan, lyckades samtliga modeller identifiera det syntetiska sambandet. Utifrån det drogs slutsatsen att det bristfälliga resultatet inte berodde på modellernas prestation utan att det inte fanns något samband i datasetet bestående av verklig monteringsdata. Det här bidrog till bedömningen att om spårbarheten på komponenterna hade ökat i framtiden, i kombination med att fler maskiner i produktionslinjen genererade data till ett sammankopplat system, skulle denna studie kunna utföras igen, men med fler variabler och ett större dataset. Support vector machine var den modell som presterade bäst, givet de prestationsmått som användes i denna studie. Det faktum att modellerna som inkluderats i den här studien lyckades identifiera sambandet i datan, när det fanns vetskap om att sambandet existerade, motiverar användandet av dessa modeller i framtida studier. Avslutningsvis kan det konstateras att med förbättrad spårbarhet och en allt mer uppkopplad fabrik, finns det möjlighet att använda maskininlärningsmodeller som komponenter i större system för att kunna uppnå effektiviseringar. / As the adaptation towards Industry 4.0 proceeds, the possibility of using machine learning as a tool for further development of industrial production, becomes increasingly profound. In this paper, a case study has been conducted at Volvo Group in Köping, in order to investigate the wherewithals of predicting quality outcomes in the compression of hub and mainshaft. In the conduction of this study, three different machine learning models were implemented and compared amongst each other. A dataset containing data from Volvo’s production site in Köping was utilized when training and evaluating the models. However, the low evaluation scores acquired from this, indicate that the quality outcome of the compression could not be predicted given solely the variables included in that dataset. Therefore, a dataset containing three additional variables consisting of fabricated values and a known causality between two of the variables and the quality outcome, was also utilized. The purpose of this was to investigate whether the poor evaluation metrics resulted from a non-existent pattern between the included variables and the quality outcome, or from the models not being able to find the pattern. The performance of the models, when trained and evaluated on the fabricated dataset, indicate that the models were in fact able to find the pattern that was known to exist. Support vector machine was the model that performed best, given the evaluation metrics that were chosen in this study. Consequently, if the traceability of the components were to be enhanced in the future and an additional number of machines in the production line would transmit production data to a connected system, it would be possible to conduct the study again with additional variables and a larger data set. The fact that the models included in this study succeeded in finding patterns in the dataset when such patterns were known to exist, motivates the use of the same models. Furthermore, it can be concluded that with enhanced traceability of the components and a larger amount of machines transmitting production data to a connected system, there is a possibility that machine learning models could be utilized as components in larger business monitoring systems, in order to achieve efficiencies. Data mining maskininlärning kvalitetskontroll industriproduktion logistisk regression k-nearest neighbor support vector machine Computer and Information Sciences Data- och informationsvetenskap

Search results