Global ETD Search

381	Error detection in blood work : Acomparison of self-supervised deep learning-based models / Felupptäckning i blodprov : En jämförelse av självbevakade djupinlärningsmodeller Vinell, Paul January 2022 (has links) Errors in medical testing may cause serious problems that has the potential to severely hurt patients. There are many machine learning methods to discover such errors. However, due to the rarity of errors, it is difficult to collect enough examples to learn from them. It is therefore important to focus on methods that do not require human labeling. This study presents a comparison of neural network-based models for the detection of analytical errors in blood tests containing five markers of cardiovascular health. The results show that error detection in blood tests using deep learning is a promising preventative mechanism. It is also shown that it is beneficial to take a multivariate approach to error detection so that the model examines several blood tests at once. There may also be benefits to looking at multiple health markers simultaneously, although this benefit is more pronounced when looking at individual blood tests. The comparison shows that a supervised approach significantly outperforms outlier detection methods on error detection. Given the effectiveness of the supervised model, there is reason to further study and potentially employ deep learning-based error detection to reduce the risk of errors. / Fel i medicinska tester kan orsaka allvarliga problem som har potential att allvarligt skada patienter. Det finns många maskininlärningsmetoder för att upptäcka sådana fel. Men på grund av att felen är sällsynta så är det svårt att samla in tillräckligt många exempel för att lära av dem. Det är därför viktigt att fokusera på metoder som inte kräver mänsklig märkning. Denna studie presenterar en jämförelse av neurala nätverksbaserade modeller för detektering av analytiska fel i blodprov som innehåller fem markörer för kardiovaskulär hälsa. Resultaten visar att feldetektering i blodprov med hjälp av djupinlärning är en lovande förebyggande mekanism. Det har också visat sig att det är fördelaktigt att använda ett multivariat tillvägagångssätt för feldetektering så att modellen undersöker flera blodprov samtidigt. Det kan också finnas fördelar med att titta på flera hälsomarkörer samtidigt, även om denna fördel är tydligare när modellen tittar på individuella blodprov. Jämförelsen visar att ett övervakat tillvägagångssätt avsevärt överträffar metoder för detektering av extremvärden vid feldetektering. Med tanke på effektiviteten av den övervakade modellen finns det anledning att studera tillvägagångssättet vidare och eventuellt använda djupinlärningsbaserad feldetektering för att minska risken för fel. anomaly detection outlier detection error detection machine learning deep learning blood work blood tests felupptäckning extremvärden maskininlärning djupinlärning blodprov Computer Sciences Datavetenskap (datalogi)
382	Scalable Nonparametric L1 Density Estimation via Sparse Subtree Partitioning Sandstedt, Axel January 2023 (has links) We consider the construction of multivariate histogram estimators for any density f seeking to minimize its L1 distance to the true underlying density using arbitrarily large sample sizes. Theory for such estimators exist and the early stages of distributed implementations are available. Our main contributions are new algorithms which seek to optimise out unnecessary network communication taking place in the distributed stages of the construction of such estimators using sparse binary tree arithmetics. density estimation scalable density estimation nonparametric density estimation L1 L_1 anomaly detection regression analysis Probability Theory and Statistics Sannolikhetsteori och statistik
383	Scalable and explainable self-supervised motif discovery in temporal data Bakhtiari Ramezani, Somayeh 08 December 2023 (has links) (PDF) The availability of a scalable and explainable rule extraction technique via motif discovery is crucial for identifying the health states of a system. Such a technique can enable the creation of a repository of normal and abnormal states of the system and identify the system’s state as we receive data. In complex systems such as ECG, each activity session can consist of a long sequence of motifs that form different global structures. As a result, applying machine learning algorithms without first identifying the local patterns is not feasible and would result in low performance. Thus, extracting unique local motifs and establishing a database of prototypes or signatures is a crucial first step in analyzing long temporal data that reduces the computational cost and overcomes imbalanced data. The present research aims to streamline the extraction of motifs and add explainability to their analysis by identifying their differences. We have developed a novel framework for unsupervised motif extraction. We also offer a robust algorithm to identify unique motifs and their signatures, coupled with a proper distance metric to compare the signatures of partially similar motifs. Defining such distance metrics allows us to assign a degree of semblance between two motifs that may have different lengths or contain noise. We have tested our framework against five different datasets and observed excellent results, including extraction of motifs from 100 million samples in 8.02 seconds, 99.90% accuracy in self-supervised ECG data classification, and an average error of 16.66% in RUL prediction of bearing failure. Motif discovery Temporal data Self-supervised Pattern Clustering Pattern detection Predictive maintenance Anomaly detection ECG data Artificial Intelligence and Robotics Data Science
384	A Review of Anomaly Detection Techniques forHeterogeneous Datasets / Undersökning av Anomalidetekteringsmetoder för Heterogena Datamängder Piroti, Shirwan January 2021 (has links) Anomaly detection is a field of study that is closely associated with machine learning and it is the process of finding irregularities in datasets. Developing and maintaining multiple machine learning models for anomaly detection takes time and can be an expensive task. One proposed solution is to combine all datasets and create a single model. This creates a heterogeneous dataset with a wide variation in its distribution, making it difficult to find anomalies in the dataset. The objective of this thesis is then to identify a framework that is suitable for anomaly detection in heterogeneous datasets. A selection of five methods were implemented in this project - 2 supervised learning approaches and 3 unsupervised learning approaches. These models are trained on 3 synthetic datasets that have been designed to be heterogeneous with an imbalance between the classes as anomalies are rare events. The performance of the models are evaluated with the AUC and the F1-score, aswell as observing the Precision-Recall Curve. The results makes it evident that anomaly detection in heterogeneous datasets is a challenging task. The best performing approach was with a random forest model where the class imbalance problem had been solved by generating synthetic samples of the anomaly class by implementing a generative adversarial network. / Anomalidetektering är ett studieområde som är starkt förknippat med maskininlärning och det kan beskrivas som processen att hitta avvikelser i datamängder. Att utveckla och underhålla flera maskininlärningsmodeller tar tid och kan vara kostsamt. Ett förslag för att lösa dessa problem är att kombinera alla dataset och skapa endast en modell. Detta leder till att datamängden blir heterogen i dess fördelning och gör det mer utmanande att skapa en modell som kan detektera anomalier. Syftet i denna tes är att identifiera ett ramverk som är lämpligt för anomalidetektering i heterogena datamängder. Ett urval av fem metoder tillämpades i detta projekt - 2 metoder inom övervakad inlärning och 3 metoder inom oövervakad inlärning. Dessa modeller är tränade på syntetiska datamängder som är framtagna så att de är heterogena i dess fördelning och har en urbalans mellan klasserna då anomalier är sällsynta händelser. Modellernas prestanda evalueras genom att beräkna dess AUC och F1-värde, samt observera Precision-Recall kurvan. Resultaten gör det tydligt att anomalidetektering i heterogena datamängder är ett utmanande uppdrag. Den model som presterade bäst var en random forest model där urbalansen mellan klasserna var omhändertagen genom att generera syntetiska observation av anomaliklassen med hjälp av en generativ advarserial network. Anomaly Detection Heterogeneous GAN BiGAN Autoencoder Random Forest Isolation Forest Anomalidetektering Heterogen GAN BiGAN Autoencoder Random Forest Isolation Forest Computational Mathematics Beräkningsmatematik
385	Anomaly Detection in the EtherCAT Network of a Power Station : Improving a Graph Convolutional Neural Network Framework Barth, Niklas January 2023 (has links) In this thesis, an anomaly detection framework is assessed and fine-tuned to detect and explain anomalies in a power station, where EtherCAT, an Industrial Control System, is employed for monitoring. The chosen framework is based on a previously published Graph Neural Network (GNN) model, utilizing attention mechanisms to capture complex relationships between diverse measurements within the EtherCAT system. To address the challenges in graph learning and improve model performance and computational efficiency, the study introduces a novel similarity thresholding approach. This approach dynamically selects the number of neighbors for each node based on their similarity instead of adhering to a fixed 'k' value, thus making the learning process more adaptive and efficient. Further in the exploration, the study integrates Extreme Value Theory (EVT) into the framework to set the anomaly detection threshold and assess its effectiveness. The effect of temporal features on model performance is examined, and the role of seconds of the day as a temporal feature is notably highlighted. These various methodological innovations aim to refine the application of the attention based GNN framework to the EtherCAT system. The results obtained in this study illustrate that the similarity thresholding approach significantly improves the model's F1 score compared to the standard TopK approach. The inclusion of seconds of the day as a temporal feature led to modest improvements in model performance, and the application of EVT as a thresholding technique was explored, although it did not yield significant benefits in this context. Despite the limitations, including the utilization of a single-day dataset for training, the thesis provides valuable insights for the detection of anomalies in EtherCAT systems, contributing both to the literature and the practitioners in the field. It lays the groundwork for future research in this domain, highlighting key areas for further exploration such as larger datasets, alternative anomaly detection techniques, and the application of the framework in streaming data environments. / I denna avhandling utvärderas och finslipas ett ramverk för att detektera och förklara anomalier på ett kraftverk, där EtherCAT, ett industriellt styrsystem, används för övervakning. Det valda ramverket är baserat på en tidigare publicerad graf neurala nätverksmodell (GNN) som använder uppmärksamhetsmekanismer för att fånga komplexa samband mellan olika mätningar inom EtherCAT-systemet. För att hantera utmaningar inom grafiskt lärande och förbättra modellens prestanda och beräkningseffektivitet introducerar studien en ny metod för likhetsgränsdragning. Denna metod väljer dynamiskt antalet grannar för varje nod baserat på deras likhet istället för att hålla sig till ett fast 'k'-värde, vilket gör inlärningsprocessen mer anpassningsbar och effektiv. I en vidare undersökning integrerar studien extremvärdesteori (EVT) i ramverket för att sätta tröskeln för detektering av anomalier och utvärdera dess effektivitet. Effekten av tidsberoende egenskaper på modellens prestanda undersöks, och sekunder av dagen som en tidsberoende egenskap framhävs särskilt. Dessa olika metodologiska innovationer syftar till att förädla användningen av det uppmärksamhetsbaserade GNN-ramverket på EtherCAT-systemet. Resultaten som erhållits i denna studie illustrerar att likhetsgränsdragning väsentligt förbättrar modellens F1-poäng jämfört med den standardiserade TopK-metoden. Inkluderingen av sekunder av dagen som en tidsberoende egenskap ledde till blygsamma förbättringar i modellens prestanda, och användningen av EVT som en tröskelmetod undersöktes, även om den inte gav några betydande fördelar i detta sammanhang. Trots begränsningarna, inklusive användningen av ett dataset för endast en dag för träning, ger avhandlingen värdefulla insikter för detektering av anomalier i EtherCAT-system, och bidrar både till litteraturen och praktiker inom området. Den lägger grunden för framtida forskning inom detta område, och belyser nyckelområden för ytterligare utforskning såsom större dataset, alternativa tekniker för detektering av anomalier och tillämpningen av ramverket i strömmande data-miljöer. Unsupervised Learning Multivariate Time Series Graph Convolutional Neural Networks Anomaly Detection Industrial Control System EtherCAT Power Station Electricity Grid Computer and Information Sciences Data- och informationsvetenskap
386	An autonomous host-based intrusion detection and prevention system for Android mobile devices. Design and implementation of an autonomous host-based Intrusion Detection and Prevention System (IDPS), incorporating Machine Learning and statistical algorithms, for Android mobile devices Ribeiro, José C.V.G. January 2019 (has links) This research work presents the design and implementation of a host-based Intrusion Detection and Prevention System (IDPS) called HIDROID (Host-based Intrusion Detection and protection system for andROID) for Android smartphones. It runs completely on the mobile device, with a minimal computation burden. It collects data in real-time, periodically sampling features that reflect the overall utilisation of scarce resources of a mobile device (e.g. CPU, memory, battery, bandwidth, etc.). The Detection Engine of HIDROID adopts an anomaly-based approach by exploiting statistical and machine learning algorithms. That is, it builds a data-driven model for benign behaviour and looks for the outliers considered as suspicious activities. Any observation failing to match this model triggers an alert and the preventive agent takes proper countermeasure(s) to minimise the risk. The key novel characteristic of the Detection Engine of HIDROID is the fact that it requires no malicious data for training or tuning. In fact, the Detection Engine implements the following two anomaly detection algorithms: a variation of K-Means algorithm with only one cluster and the univariate Gaussian algorithm. Experimental test results on a real device show that HIDROID is well able to learn and discriminate normal from anomalous behaviour, demonstrating a very promising detection accuracy of up to 0.91, while maintaining false positive rate below 0.03. Finally, it is noteworthy to mention that to the best of our knowledge, publicly available datasets representing benign and abnormal behaviour of Android smartphones do not exist. Thus, in the context of this research work, two new datasets were generated in order to evaluate HIDROID. / Fundação para a Ciência e Tecnologia (FCT-Portugal) with reference SFRH/BD/112755/2015, European Regional Development Fund (FEDER), through the Competitiveness and Internationalization Operational Programme (COMPETE 2020), Regional Operational Program of the Algarve (2020), Fundação para a Ciência e Tecnologia; i-Five .: Extensão do acesso de espectro dinâmico para rádio 5G, POCI-01-0145-FEDER-030500, Instituto de telecomunicações, (IT-Portugal) as the host institution. Security Intrusion detection Android 5G Prevention Host-based Malware detection Host-based IDS Statistical anomaly detection Machine learning
387	AI/ML Development for RAN Applications : Deep Learning in Log Event Prediction / AI/ML-utveckling för RAN-applikationer : Deep Learning i Log Event Prediction Sun, Yuxin January 2023 (has links) Since many log tracing application and diagnostic commands are now available on nodes at base station, event log can easily be collected, parsed and structured for network performance analysis. In order to improve In Service Performance of customer network, a sequential machine learning model can be trained, test, and deployed on each node to learn from the past events to predict future crashes or a failure. This thesis project focuses on the evaluation and analysis of the effectiveness of deep learning models in predicting log events. It explores the application of stacked long short-term memory(LSTM) based model in capturing temporal dependencies and patterns within log event data. In addition, it investigates the probability distribution of the next event from the logs and estimates event trigger time to predict the future node restart event. This thesis project aims to improve the node availability time in base station of Ericsson and contribute to further application in log event prediction using deep learning techniques. A framework with two main phases is utilized to analyze and predict the occurrence of restart events based on the sequence of events. In the first phase, we perform natural language processing(NLP) on the log content to obtain the log key, and then identify the sequence that will cause the restart event from the sequence node events. In the second phase, we analyze these sequence of events which resulted in restart, and predict how many minutes in the future the restart event will occur. Experiment results show that our framework achieves no less than 73% accuracy on restart prediction and more than 1.5 minutes lead time on restart. Moreover, our framework also performs well for non-restart events. / Eftersom många loggspårningsapplikationer och diagnostiska kommandon nu finns tillgängliga på noder vid basstationen, kan händelseloggar enkelt samlas in, analyseras och struktureras för analys av nätverksprestanda. För att förbättra kundnätverkets In Service Performance kan en sekventiell maskininlärningsmodell tränas, testas och distribueras på varje nod för att lära av tidigare händelser för att förutsäga framtida krascher eller ett fel. Detta examensarbete fokuserar på utvärdering och analys av effektiviteten hos modeller för djupinlärning för att förutsäga logghändelser. Den utforskar tillämpningen av staplade långtidsminne (LSTM)-baserad modell för att fånga tidsmässiga beroenden och mönster i logghändelsedata. Dessutom undersöker den sannolikhetsfördelningen för nästa händelse från loggarna och uppskattar händelseutlösningstiden för att förutsäga den framtida omstartshändelsen för noden. Detta examensarbete syftar till att förbättra nodtillgänglighetstiden i Ericssons basstation och bidra till ytterligare tillämpning inom logghändelseprediktion med hjälp av djupinlärningstekniker. Ett ramverk med två huvudfaser används för att analysera och förutsäga förekomsten av omstartshändelser baserat på händelseförloppet. I den första fasen utför vi naturlig språkbehandling (NLP) på logginnehållet för att erhålla loggnyckeln och identifierar sedan sekvensen som kommer att orsaka omstartshändelsen från sekvensnodhändelserna. I den andra fasen analyserar vi dessa händelseförlopp som resulterade i omstart och förutsäger hur många minuter i framtiden omstartshändelsen kommer att inträffa. Experimentresultat visar att vårt ramverk uppnår inte mindre än 73% noggrannhet vid omstartsförutsägelse och mer än 1,5 minuters ledtid vid omstart. Dessutom fungerar vårt ramverk bra för händelser som inte startar om. LSTM Anomaly Detection Failure Prediction Log Mining Deep Learning LSTM Anomali Detection Failure Prediction Log Mining Deep Learning Computer and Information Sciences Data- och informationsvetenskap
388	Self-Learning Methodology for Failure Detection in an Oil- Hydraulic Press : Predictive maintenance Guillen Rosaperez, Diego Alonso January 2020 (has links) Deep Learning methods have dramatically improved the state-of-the-art across multiple fields, such as speech recognition, object detection, among others. Nevertheless, its application on signal processing, where data is frequently unlabelled, has received relatively little attention. In this field, nowadays, a set of sub-optimal techniques are often used. They usually require an expert to manually extract features to analyse, which is a knowledge and labour intensive process. Thus, a self-learning technique could improve current methods. Moreover, certain machines in a factory are particularly complex, such as an oil-hydraulic press. Here, its sensors can only identify few failures by setting up some thresholds, but they commonly cannot detect wear on its internal components. So, a self-learning technique would be required to detect anomalies related to deterioration. The concept is to determine the condition of a machine and to predict breakdowns by analysing patterns in the measurements from their sensors. This document proposes a self-learning methodology that uses a deep learning model to predict failures in such a machine. The core idea is to train an algorithm that can identify by itself the relevant features to extract on a work cycle, and to relate them to a part which will breakdown. The conducted evaluation focuses on an example case where a hydraulic accumulator fails. As result, it was possible to forecast its breakdown two weeks in advance. Finally, the proposed method provides explanations at every step, after acknowledging their importance in industrial applications. Also, some considerations and limitations of this technique are stated to support guiding the expectation of some stakeholders in a factory, i.e. a (Global) Process Owner. / Deep Learning-metoder har dramatiskt förbättrat det senaste inom flera fält, såsom taligenkänning, objektdetektering, bland andra. Ändå har dess tillämpning på signalbehandling, där data ofta är omärkt, fått relativt lite uppmärksamhet. I detta fält används numera ofta en uppsättning suboptimala tekniker. De kräver vanligtvis en expert för att manuellt extrahera funktioner för att analysera, vilket är en kunskaps och arbetsintensiv process. Således kan en självlärande teknik förbättra nuvarande metoder. Dessutom är vissa maskiner i en fabrik särskilt komplexa, såsom en oljehydraulisk press. Här kan dess sensorer bara identifiera några fel genom att ställa in vissa trösklar, men de kan vanligtvis inte upptäcka slitage på dess interna komponenter. Så, en självlärande teknik skulle krävas för att upptäcka avvikelser relaterade till försämring. Konceptet är att bestämma maskinens tillstånd och att förutsäga haverier genom att analysera mönster i mätningarna från deras sensorer. Detta dokument föreslår en självlärningsmetodik som använder en djupinlärningsmodell för att förutsäga fel i en sådan maskin. Kärnidén är att träna en algoritm som i sig kan identifiera de relevanta funktionerna som ska extraheras i en arbetscykel och att relatera dem till en del som kommer att bryta ner. Den genomförda utvärderingen fokuserar på ett exempel på fall där en hydraulisk ackumulator misslyckas. Som ett resultat var det möjligt att förutse dess fördelning två veckor i förväg. Slutligen ger den föreslagna metoden förklaringar i varje steg, efter att ha erkänt deras betydelse i industriella applikationer. Några överväganden och begränsningar av denna teknik anges också som stöd för att vägleda förväntningarna hos vissa intressenter i en fabrik, dvs. en (global) processägare. Signal Processing Deep Learning Machine Learning Predictive Maintenance Anomaly Detection. Signalbehandling djupinlärning maskininlärning förutsägbart underhåll upptäckt av avvikelser. Computer and Information Sciences Data- och informationsvetenskap
389	Anomaly Detection and Revenue Loss Estimation in Accounting Data Edholm, Gustav January 2020 (has links) Loss of revenue due to erroneous invoicing is a serious problem for many companies in the repair and maintenance industry. Revenue loss can occur in many ways, for example by consistently charging the wrong hourly price for services. If a company is experiencing revenue loss, it is incredibly important to detect it, find where it is happening, and estimate the size of it in order to treat it. The goal of this work is to find statistical methods for detecting incorrectly charged services in a dataset of invoices, and estimate the loss of revenue in the same dataset. The dataset used comes from a real company experiencing revenue loss through incorrectly charged prices for services, and thus represents a real world instance of this problem. Multiple machine learning methods with different levels of supervision are tested for detecting anomalous invoice items and estimating revenue loss using raw invoice data. Neural network regression, and different decision tree regression methods, as well as an ensemble of these are tested and compared. The dataset has ground truth labels for each price, thus results are compared to real world targets. It is found that an ensemble using a weighted average of predictions from neural network regression and gradient boosted decision tree regression to predict the charged prices in an invoice dataset performs anomaly detection most reliably. On the top 1000 anomaly candidates, this method flags anomalies correctly 87% of the time, catching 45% of all anomalies. Moreover, in terms of estimating revenue loss, using a neural network to perform regression, a revenue loss error of just 13% is achieved. / Förlorad omsättning till följd av felaktig fakturering ar ett alvarligt problem for vissa företag i service- och reparationsbranchen. Detta kan uppstå på manga satt, till exempel genom konsekvent felaktig prissättning av tjänster. Om ett företag har stor förlust av omsättning ar det otroligt viktigt att upptäcka det, hitta var det sker, och uppskatta storleken av förlusten for att kunna behandla den. Malet med detta arbete ar att hitta statistiska metoder for att identifiera felaktigt prissatta tjänster i ett dataset av fakturor, och uppskatta förlorad omsättning i datasetet. Datasetet som används kommer från ett företag som förlorar omsättning på grund av just felfakturerat pris på tjänster, och representerar därför en verklig instans av detta problem. Ett flertal maskininlärningsmetoder, med olika grader av vägledning, används for att upptäcka felaktiga fakturarader och uppskatta förlorad omsättning i omärkt fakturadata. Regression med neuronnät, och olika beslutstradmetoder såväl som en ensembel av dessa testas och jämförs. Datasetet har sanningsenliga ettiketter till varje rad, därmed kan resultaten jämföras och utvärderas mot korrekta priser. Vi finner att en ensembel av ett neuralnät och ett gradientförstärkt beslutstrad for regression identifierar felaktiga prissättningar mest pålitligt. Pa de 1000 mest sannolika felen har denna metod ratt på 87%, vilket fångar 45% av alla fel. Vidare, med hänsyn till förlorad omsättning finner vi att ett neuralnät som utför regresssion uppnår ett fel på endast 13% i sitt estimat av förlorad omsättning. Machine Learning Anomaly Detection Regression Neural Network Invoice Revenue Loss Maskininlärning Avvikelsedetektion Regression Neuralnät Faktura Omsättningsförlust Computer and Information Sciences Data- och informationsvetenskap
390	Anomaly Detection for Root Cause Analysis in System Logs using Long Short-Term Memory / Anomalidetektion för Grundorsaksanalys i Loggar från Mjukvara med hjälp av Long Short-Term Memory von Hacht, Johan January 2021 (has links) Many software systems are under test to ensure that they function as expected. Sometimes, a test can fail, and in that case, it is essential to understand the cause of the failure. However, as systems grow larger and become more complex, this task can become non-trivial and potentially take much time. Therefore, even partially, automating the process of root cause analysis can save time for the developers involved. This thesis investigates the use of a Long Short-Term Memory (LSTM) anomaly detector in system logs for root cause analysis. The implementation is evaluated in a quantitative and a qualitative experiment. The quantitative experiment evaluates the performance of the anomaly detector in terms of precision, recall, and F1 measure. Anomaly injection is used to measure these metrics since there are no labels in the data. Additionally, the LSTM is compared with a baseline model. The qualitative experiment evaluates how effective the anomaly detector could be for root cause analysis of the test failures. This was evaluated in interviews with an expert in the software system that produced the log data that the thesis uses. The results show that the LSTM anomaly detector achieved a higher F1 measure than the proposed baseline implementation thanks to its ability to detect unusual events and events happening out of order. The qualitative results indicate that the anomaly detector could be used for root cause analysis. In many of the evaluated test failures, the expert being interviewed could deduce the cause of the failure. Even if the detector did not find the exact issue, a particular part of the software might be highlighted, meaning that it produces many anomalous log messages. With this information, the expert could contact the people responsible for that part of the application for help. In conclusion, the anomaly detector automatically collects the necessary information for the expert to perform root cause analysis. As a result, it could save the expert time to perform this task. With further improvements, it could also be possible for non-experts to utilise the anomaly detector, reducing the need for an expert. / Många mjukvarusystem testas för att försäkra att de fungerar som de ska. Ibland kan ett test misslyckas och i detta fall är det viktigt att förstå varför det gick fel. Detta kan bli problematiskt när mjukvarusystemen växer och blir mer komplexa eftersom att denna uppgift kan bli icke trivial och ta mycket tid. Om man skulle kunna automatisera felsökningsprocessen skulle det kunna spara mycket tid för de invloverade utvecklarna. Denna rapport undersöker användningen av en Long Short-Term Memory (LSTM) anomalidetektor för grundorsaksanalys i loggar. Implementationen utvärderas genom en kvantitativ och kvalitativ undersökning. Den kvantitativa undersökningen utvärderar prestandan av anomalidetektorn med precision, recall och F1 mått. Artificiellt insatta anomalier används för att kunna beräkna dessa mått eftersom att det inte finns etiketter i den använda datan. Implementationen jämförs också med en annan simpel anomalidetektor. Den kvalitativa undersökning utvärderar hur användbar anomalidetektorn är för grundorsaksanalys för misslyckade tester. Detta utvärderades genom intervjuer med en expert inom mjukvaran som producerade datan som användes in denna rapport. Resultaten visar att LSTM anomalidetektorn lyckades nå ett högre F1 mått jämfört med den simpla modellen. Detta tack vare att den kunde upptäcka ovanliga loggmeddelanden och loggmeddelanden som skedde i fel ordning. De kvalitativa resultaten pekar på att anomalidetektorn kan användas för grundorsaksanalys för misslyckade tester. I många av de misslyckade tester som utvärderades kunde experten hitta anledningen till att felet misslyckades genom det som hittades av anomalidetektorn. Även om detektorn inte hittade den exakta orsaken till att testet misslyckades så kan den belysa en vissa del av mjukvaran. Detta betyder att just den delen av mjukvaran producerad många anomalier i loggarna. Med denna information kan experten kontakta andra personer som känner till den delen av mjukvaran bättre för hjälp. Anomalidetektorn automatiskt den information som är viktig för att experten ska kunna utföra grundorsaksanalys. Tack vare detta kan experten spendera mindre tid på denna uppgift. Med vissa förbättringar skulle det också kunna vara möjligt för mindre erfarna utvecklare att använda anomalidetektorn. Detta minskar behovet för en expert. Anomaly detection Root cause analysis System logs Long Short-Term Memory Machine learning Anomalidetektion Grundorsaksanalys System loggar Long Short-Term Memory Maskininlärning Computer Sciences Datavetenskap (datalogi)

Search results