31 |
Insurance Fraud Detection using Unsupervised Sequential Anomaly Detection / Detektion av försäkringsbedrägeri med oövervakad sekvensiell anomalitetsdetektionHansson, Anton, Cedervall, Hugo January 2022 (has links)
Fraud is a common crime within the insurance industry, and insurance companies want to quickly identify fraudulent claimants as they often result in higher premiums for honest customers. Due to the digital transformation where the sheer volume and complexity of available data has grown, manual fraud detection is no longer suitable. This work aims to automate the detection of fraudulent claimants and gain practical insights into fraudulent behavior using unsupervised anomaly detection, which, compared to supervised methods, allows for a more cost-efficient and practical application in the insurance industry. To obtain interpretable results and benefit from the temporal dependencies in human behavior, we propose two variations of LSTM based autoencoders to classify sequences of insurance claims. Autoencoders can provide feature importances that give insight into the models' predictions, which is essential when models are put to practice. This approach relies on the assumption that outliers in the data are fraudulent. The models were trained and evaluated on a dataset we engineered using data from a Swedish insurance company, where the few labeled frauds that existed were solely used for validation and testing. Experimental results show state-of-the-art performance, and further evaluation shows that the combination of autoencoders and LSTMs are efficient but have similar performance to the employed baselines. This thesis provides an entry point for interested practitioners to learn key aspects of anomaly detection within fraud detection by thoroughly discussing the subject at hand and the details of our work. / <p>Gjordes digitalt via Zoom. </p>
|
32 |
Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning TechniquesAljifri, Ahmed January 2024 (has links)
This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
|
33 |
Využití umělé inteligence v technické diagnostice / Utilization of artificial intelligence in technical diagnosticsKonečný, Antonín January 2021 (has links)
The diploma thesis is focused on the use of artificial intelligence methods for evaluating the fault condition of machinery. The evaluated data are from a vibrodiagnostic model for simulation of static and dynamic unbalances. The machine learning methods are applied, specifically supervised learning. The thesis describes the Spyder software environment, its alternatives, and the Python programming language, in which the scripts are written. It contains an overview with a description of the libraries (Scikit-learn, SciPy, Pandas ...) and methods — K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests Classifiers (RF). The results of the classification are visualized in the confusion matrix for each method. The appendix includes written scripts for feature engineering, hyperparameter tuning, evaluation of learning success and classification with visualization of the result.
|
34 |
Exploring the Feasibility of Exercise Detection on the Exxentric kBox Platform / Undersökning av möjligheten att detektera övningar på Exxentric kBox-platformenMehr, Mahyar January 2023 (has links)
Flywheel training is an increasingly popular training method that aids in the recovery process and promotes strength development while reducing the risk of re-injury. Additionally, automatic exercise classification offers athletes the convenience of effortlessly monitoring and tracking their training progress, enabling them to maintain consistency and achieve their fitness goals effectively. This thesis aims to investigate the feasibility and accuracy of developing a machine-learning model for classifying exercises performed on Exxentric kBox machines. The objective is to assess the model’s accuracy and determine whether the features provided by the Exxentric app are sufficient for constructing a robust classifier. To lay a strong foundation for the investigation, the research begins with a comprehensive literature review of exercise recognition studies. An exploratory data analysis is then conducted to gain valuable insights into the characteristics of the exercise data. The data preparation phase involves various techniques such as cleaning, feature engineering, scaling, sampling, and encoding to optimize the data for modeling. Moreover, signal processing techniques are employed to extract relevant features from the exercise data. A testing protocol is established, consisting of two sets of ten exercises. Each exercise is performed with a randomized number of repetitions, ranging from 5 to 12 repetitions. Data collection is carried out with the participation of ten individuals using the Exxentric App on their smartphones. Different types of classifiers are trained using data from the Exxentric database and tested on the collected data on-site, employing the generated features. Additionally, a CNN classifier is explored, utilizing only angular velocity as input. Comparative analysis is performed on the evaluation metrics of the models. In conclusion, while achieving accurate classification for all ten exercises was not fully realized, the CNN model relying on angular velocity as input exhibited promising results. Notably, squats were predicted correctly 95% of the time, which is the most prominent observation. The model also demonstrated significant accuracy in correctly identifying bent-over rows (72%), deadlifts (72.2%), standing calf raises (70.6%), and biceps curls (67%). Further research is warranted to improve the effectiveness and accuracy of exercise classification models. This includes exploring alternative input methods and refining feature engineering techniques to advance the field. / Svänghjulsträning är en alltmer populär träningsmetod som underlättar återhämtningsprocessen och främjar styrkeutveckling samtidigt som den minskar risken för nya skador. Dessutom erbjuder automatisk träningsklassificering idrottare bekvämligheten att enkelt övervaka och spåra sina träningsframsteg, vilket gör det möjligt för dem att upprätthålla konsekvens och effektivt uppnå sina träningsmål. Denna avhandling syftar till att undersöka genomförbarheten och noggrannheten hos att utveckla en maskininlärningsmodell för att klassificera övningar som utförs på Exxentric kBox-maskiner. Målet är att bedöma modellens noggrannhet och avgöra om funktionerna som tillhandahålls av Exxentric-appen är tillräckliga för att konstruera en robust klassificerare. För att lägga en stark grund för undersökningen inleds forskningen med en omfattande litteraturgenomgång av studier om igenkänning av övningar. Därefter genomförs en explorativ dataanalys för att få värdefulla insikter om egenskaperna hos övningsdatan. Dataförberedelsen innefattar olika tekniker såsom rengöring, funktionsteknik, skalning, provtagning och kodning för att optimera datan för modellering. Dessutom används signalbehandlingstekniker för att extrahera relevanta egenskaper från övningsdatan. En testprotokoll etableras, bestående av två uppsättningar med tio övningar. Varje övning utförs med ett slumpmässigt antal repetitioner, från 5 till 12 repetitioner. Insamlingen av data utförs med deltagande av tio individer som använder Exxentric-appen på sina smartphones. Olika typer av klassificerare tränas med hjälp av data från Exxentricdatabasen och testas på den insamlade datan på plats genom att använda de genererade egenskaperna. Dessutom undersöks en CNN-klassificerare som enbart använder vinkelhastighet som indata. En jämförande analys utförs på utvärderingsmåtten för modellerna. Slutsatsen är att även om det inte var möjligt att uppnå en korrekt klassificering för alla tio övningar, uppvisade CNN-modellen, med enbart vinkelhastighet som indata, lovande resultat. Noterbart är att knäböjningar korrekt förutsades 95% av tiden, vilket är den mest framträdande observationen. Modellen visade även betydande noggrannhet vid korrekt identifiering av stående rodd (72%), marklyft (72,2%), stående vadpress (70,6%) och bicepscurls (67%). Ytterligare forskning motiveras för att förbättra effektiviteten och noggrannheten hos modeller för klassificering av övningar. Detta inkluderar att utforska alternativa metoder för indata och att förbättra teknikerna för funktionsteknik för att vidareutveckla området.
|
35 |
Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning ApproachYousif, Jacob, Scarano, Donato January 2024 (has links)
Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks.
|
36 |
The Development and Application of Mass Spectrometry-based Structural Proteomic Approaches to Study Protein Structure and InteractionsMakepeace, Karl A.T. 26 August 2022 (has links)
Proteins and their intricate network of interactions are fundamental to many molecular processes that govern life. Mass spectrometry-based structural proteomics represents a powerful set of techniques for characterizing protein structures and interactions. The last decade has witnessed a large-scale adoption in the application of these techniques toward solving a variety of biological questions. Addressing these questions has often been coincident with the further development of these techniques.
Insight into the structures of individual proteins and their interactions with other proteins in a proteome-wide context has been made possible by recent developments in the relatively new field of chemical crosslinking combined with mass spectrometry. In these experiments crosslinking reagents are used to capture protein-protein interactions by forming covalent linkages between proximal amino acid residues. The crosslinked proteins are then enzymatically digested into peptides, and the covalently-coupled crosslinked peptides are identified by mass spectrometry. These identified crosslinked peptides thus provide evidence of interacting regions within or between proteins.
In this dissertation the development of tools and methods that facilitate this powerful technique are described. The primary arc of this work follows the development and application of mass spectrometry-based approaches for the identification of protein crosslinks ranging from those which exist endogenously to those which are introduced synthetically. Firstly, the development of a novel strategy for comprehensive determination of naturally occurring protein crosslinks in the form of disulfide bonds is described. Secondly, the application of crosslinking reagents to create synthetic crosslinks in proteins coupled with molecular dynamics simulations is explored in order to structurally characterize the intrinsically disordered tau protein. Thirdly, improvements to a crosslinking-mass spectrometry method for defining a protein-protein interactome in a complex sample is developed. Altogether, these described approaches represent a toolset to allow researchers to access information about protein structure and interactions. / Graduate
|
Page generated in 0.1247 seconds