Global ETD Search

301	PRAAG Algorithm in Anomaly Detection Zhang, Dongyang January 2016 (has links) Anomaly detection has been one of the most important applications of datamining, widely applied in industries like financial, medical,telecommunication, even manufacturing. In many scenarios, data are in theform of streaming in a large amount, so it is preferred to analyze the datawithout storing all of them. In other words, the key is to improve the spaceefficiency of algorithms, for example, by extracting the statistical summary ofthe data. In this thesis, we study the PRAAG algorithm, a collective anomalydetection algorithm based on quantile feature of the data, so the spaceefficiency essentially depends on that of quantile algorithm.Firstly, the master thesis investigates quantile summary algorithms thatprovides quantile information of a dataset without storing all the data point.Then, we implement the selected algorithms and run experiments to test theperformance. Finally, the report focuses on experimenting on PRAAG tounderstand how the parameters affect the performance and compare it withother anomaly detection algorithms.In conclusion, GK algorithm provides a more space efficient way to estimatequantiles than simply storing all data points. Also, PRAAG is effective in termsof True Prediction Rate (TPR) and False Prediction Rate (FPR), comparingwith a baseline algorithm CUSUM. In addition, there are many possibleimprovements to be investigated, such as parallelizing the algorithm. / Att upptäcka avvikelser har varit en av de viktigaste tillämpningarna avdatautvinning (data mining). Det används stor utsträckning i branscher somfinans, medicin, telekommunikation, och även tillverkning. I många fallströmmas stora mängder data och då är det mest effektivt att analysera utanatt lagra data. Med andra ord är nyckeln att förbättra algoritmernasutrymmeseffektivitet till exempel genom att extraheraden statistiskasammanfattning avdatat. PRAAGär en kollektiv algoritm för att upptäckaavvikelser. Den ärbaserad på kvantilenegenskapernai datat, såutrymmeseffektiviteten beror i huvudsak på egenskapernahoskvantilalgoritmen.Examensarbetet undersöker kvantilsammanfattande algoritmer som gerkvantilinformationen av ett dataset utan att spara alla datapunkter. Vikommer fram till att GKalgoritmenuppfyllervåra krav. Sedan implementerarvialgoritmerna och genomför experiment för att testa prestandan. Slutligenfokuserar rapporten påexperiment på PRAAG för att förstå hur parametrarnapåverkar prestandan. Vi jämför även mot andra algoritmer för att upptäckaavvikelser.Sammanfattningsvis ger GK ett mer utrymmeseffektiv sätt att uppskattakvantiler än att lagra alla datapunkter. Dessutom är PRAAG, jämfört med enstandardalgoritm (CUSUM), effektiv när det gäller True Prediction Rate (TPR)och False Prediction Rate (FPR). Det finns fortfarande flertalet möjligaförbättringar som ska undersökas, t.ex. parallelisering av algoritmen. Anomaly detection Collective Anomaly Algorithm Data Mining 1 Detektion av avvikelser kollektiv avvikelse algorithm datautvinning Engineering and Technology Teknik och teknologier
302	Water Anomaly Detection Using Federated Machine Learning Wallén, Melker, Böckin, Mauricio January 2021 (has links) With the rapid increase of Internet of Things-devices(IoT), demand for new machine learning algorithms and modelshas risen. The focus of this project is implementing a federatedlearning (FL) algorithm to detect anomalies in measurementsmade by a water monitoring IoT-sensor. The FL algorithm trainsacross a collection of decentralized IoT-devices, each using thelocal data acquired from the specific sensor. The local machinelearning models are then uploaded to a mutual server andaggregated into a global model. The global model is sent back tothe sensors and is used as a template when training starts againlocally. In this project, we only have had access to one physicalsensor. This has forced us to virtually simulate sensors. Thesimulation was done by splitting the data gathered by the onlyexisting sensor. To deal with the long, sequential data gatheredby the sensor, a long short-term memory (LSTM) network wasused. This is a special type of artificial neural network (ANN)capable of learning long-term dependencies. After analyzing theobtained results it became clear that FL has the potential toproduce good results, provided that more physical sensors aredeployed. / I samband med den snabba ökningen avInternet of Things-enheter (IoT) har efterfrågan på nya algoritmeroch modeller för maskininlärning ökat. Detta projektfokuserar på att implementera en federated learning (FL) algoritmför att detektera avvikelser i mätdata från en sensorsom övervakar vattenkvaliteten. FL algoritmen tränar en samlingdecentraliserade IoT-enheter, var och en med hjälp av lokaldata från sensorn i fråga. De lokala maskininlärningsmodellernaladdas upp till en gemensam server och sammanställs till englobal modell. Den globala modellen skickas sedan tillbaka tillsensorerna och används som mall när den lokala träningen börjarigen. I det här projektet hade vi endast tillgång till en fysisksensor. Vi har därför varit tvungna att simulera sensorer. Dettagjordes genom att dela upp datamängden som samlats in frånden fysiska sensorn. För att hantera den långa sekventiella dataanvänds ett long short-term memory (LSTM) nätverk. Detta ären speciell typ av artificiellt neuronnät (ANN) som är kapabeltatt minnas mönster under en längre tid. Efter att ha analyseratresultaten blev det tydligt att FL har potentialen att produceragoda resultat, givet att fler fysiska sensorer implementeras. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Federated learning neural network anomaly detection water monitoring long short-term memory Elektroteknik och elektronik
303	Anomaly Detection using LSTM N. Networks and Naive Bayes Classifiers in Multi-Variate Time-Series Data from a Bolt Tightening Tool / Anomali detektion med LSTM neuralt nätverk och Naive Bayes klassificerare av multivariabel tidsseriedata från en mutterdragare Selander, Karl-Filip January 2021 (has links) In this thesis, an anomaly detection framework has been developed to aid in maintenance of tightening tools. The framework is built using LSTM networks and gaussian naive bayes classifiers. The suitability of LSTM networks for multi-variate sensor data and time-series prediction as a basis for anomaly detection has been explored. Current literature and research is mostly concerned with uni-variate data, where LSTM based approaches have had variable but often good results. However, most real world settings with sensor networks, such as the environment and tool from which this thesis data is gathered, are multi-variable. Thus, there is a need to research the effectiveness of the LSTM model in this setting. The thesis has emphasized the need of well defined evaluation metrics of anomaly detection approaches, the difficulties of defining anomalies and anomaly datasets, as well as illustrated the effectiveness of LSTM networks in multi-variate environments. / I den här uppsatsen har ett anomali detektions ramverk utvecklats för att bidra till underhållandet av åtdragarverktyg. Ramverket bygger på LSTM neurala nätverk och gaussian Naive Bayes klassificerare. Användbarheten av LSTM nätverk för multi-variabel data och tidsserie prediktion som basis för anomali detektion har undersökts. Nutida literatur och forskning berör mest envariabel data där LSTM baserade metoder ofta har presterat bra. Men, de flesta system i verkligheten är inte envariabel utan multivariabel, som den miljö verktyget, vars data undersöks i den här uppsatsen, opererar i. Därför anses det att det finns ett behov att undersöka användbarheten av LSTM modeller i den här typen av miljö. Det här arbetet har betonat vikten av väldefinierade utvärderingsvärden för anomali detektion, svårigheterna med att definiera anomalier och anomalidataset, samt illustrerat användbarheten av LSTM nätverk i multivariabla miljöer. LSTM anomaly detection time-series multi-variable sensor deep learning LSTM anomalidetektion tidsserie multivariabel sensor djupinlärning Engineering and Technology Teknik och teknologier
304	Jet Printing Quality ImprovementThrough Anomaly Detection UsingMachine Learning / Kvalitetsförbättring i jetprinting genom avvikelseidentifiering med maskinlärning Lind, Henrik, Janssen, Jacob January 2021 (has links) This case study examined emitted sound and actuated piezoelectric current in a solderpaste jet printing machine to conclude whether quality degradation could be detected with an autoencoder machine learning model. An autoencoder was used to detect anomalies in non-realtime that were defined asa diameter drift with an averaging window from a target diameter. A sensor and datacollection system existed for the piezoelectric current, and a microphone was proposedas a new sensor to monitor the system. The sound was preprocessed with a Fast Fourier Transform to extract information of the existing frequencies. The results of the model, visualized through reconstruction error plots and an Area Under the Curve score, show that the autoencoder successfully detected conspicuous anomalies. The study indicated that anomalies can be detected prior to solder paste supply failure using the sound. When the temperature was varied or when the jetting head nozzle was clogged by residual solder paste, the sound model identified most anomalies although the current network showed better performance. / Denna fallstudie undersökte emitterat ljud och drivande piezoelektrisk ström i en jetprinter med lödpasta för att dra slutsatsen om kvalitetsbrister kunde detekteras med en autoencoder maskininlärningsmodell. En autoencoder användes för att detektera avvikelser definierade som diametertrend med ett glidande medelvärde från en bördiameter. Tidigare studier har visat att den piezoelektriska strömmen i liknande maskiner kan användas för att förutspå kvalitetsbrister. En mikrofon föreslogs som en ny sensor för att övervaka systemet. Ljudet förbehandlades genom en snabb fouriertransform och frekvensinnehållet användes som indata i modellen. Resultaten visualiserades genom rekonstruktionsfel och metoden Area Under the Curve. Modellen upptäckte framgångsrikt tydliga avvikelser. För vissa felfall visade ljudet som indata bättre prestanda än strömmen, och för andra visade strömmen bättre prestanda. Till exempel indikerade studien att avvikelser kan detekteras före lodpasta-försörjningsfel med ljudet. Under varierande temperatur och då munstycket var igentäppt av kvarvarande lödpasta identifierade nätverket med ljud som indata de flesta avvikelser även om nätverket med strömmen visade bättre prestanda. Jet Printing Autoencoder Anomaly Detection Sound Acoustic Machine Learning Solder Paste Piezoelectric Current Self Sensing Engineering and Technology Teknik och teknologier
305	Anomaly Detection for Water Quality Data YAN, YAN January 2019 (has links) Real-time water quality monitoring using automated systems with sensors is becoming increasingly common, which enables and demands timely identification of unexpected values. Technical issues create anomalies, which at the rate of incoming data can prevent the manual detection of problematic data. This thesis deals with the problem of anomaly detection for water quality data using machine learning and statistic learning approaches. Anomalies in data can cause serious problems in posterior analysis and lead to poor decisions or incorrect conclusions. Five time series anomaly detection techniques: local outlier factor (machine learning), isolation forest (machine learning), robust random cut forest (machine learning), seasonal hybrid extreme studentized deviate (statistic learning approach), and exponential moving average (statistic learning approach) have been analyzed. Extensive experimental analysis of those techniques have been performed on data sets collected from sensors deployed in a wastewater treatment plant. The results are very promising. In the experiments, three approaches successfully detected anomalies in the ammonia data set. With the temperature data set, the local outlier factor successfully detected all twenty-six outliers whereas the seasonal hybrid extreme studentized deviate only detected one anomaly point. The exponential moving average identified ten time ranges with anomalies. Eight of them cover a total of fourteen anomalies. The reproducible experiments demonstrate that local outlier factor is a feasible approach for detecting anomalies in water quality data. Isolation forest and robust random cut forest also rate high anomaly scores for the anomalies. The result of the primary experiment confirms that local outlier factor is much faster than isolation forest, robust random cut forest, seasonal hybrid extreme studentized deviate and exponential moving average. / Thesis / Master of Computer Science (MCS) Anomaly Detection Water Quality Machine Learning Local Outlier Factor Isolation Forest Random Cut Forest S-H-ESD EMA Statistic Learning
306	Data Analytics for Statistical Learning Komolafe, Tomilayo A. 05 February 2019 (has links) The prevalence of big data has rapidly changed the usage and mechanisms of data analytics within organizations. Big data is a widely-used term without a clear definition. The difference between big data and traditional data can be characterized by four Vs: velocity (speed at which data is generated), volume (amount of data generated), variety (the data can take on different forms), and veracity (the data may be of poor/unknown quality). As many industries begin to recognize the value of big data, organizations try to capture it through means such as: side-channel data in a manufacturing operation, unstructured text-data reported by healthcare personnel, various demographic information of households from census surveys, and the range of communication data that define communities and social networks. Big data analytics generally follows this framework: first, a digitized process generates a stream of data, this raw data stream is pre-processed to convert the data into a usable format, the pre-processed data is analyzed using statistical tools. In this stage, called statistical learning of the data, analysts have two main objectives (1) develop a statistical model that captures the behavior of the process from a sample of the data (2) identify anomalies in the process. However, several open challenges still exist in this framework for big data analytics. Recently, data types such as free-text data are also being captured. Although many established processing techniques exist for other data types, free-text data comes from a wide range of individuals and is subject to syntax, grammar, language, and colloquialisms that require substantially different processing approaches. Once the data is processed, open challenges still exist in the statistical learning step of understanding the data. Statistical learning aims to satisfy two objectives, (1) develop a model that highlights general patterns in the data (2) create a signaling mechanism to identify if outliers are present in the data. Statistical modeling is widely utilized as researchers have created a variety of statistical models to explain everyday phenomena such as predicting energy usage behavior, traffic patterns, and stock market behaviors, among others. However, new applications of big data with increasingly varied designs present interesting challenges. Consider the example of free-text analysis posed above. There's a renewed interest in modeling free-text narratives from sources such as online reviews, customer complaints, or patient safety event reports, into intuitive themes or topics. As previously mentioned, documents describing the same phenomena can vary widely in their word usage and structure. Another recent interest area of statistical learning is using the environmental conditions that people live, work, and grow in, to infer their quality of life. It is well established that social factors play a role in overall health outcomes, however, clinical applications of these social determinants of health is a recent and an open problem. These examples are just a few of many examples wherein new applications of big data pose complex challenges requiring thoughtful and inventive approaches to processing, analyzing, and modeling data. Although a large body of research exists in the area of anomaly detection increasingly complicated data sources (such as side-channel related data or network-based data) present equally convoluted challenges. For effective anomaly-detection, analysts define parameters and rules, so that when large collections of raw data are aggregated, pieces of data that do not conform are easily noticed and flagged. In this work, I investigate the different steps of the data analytics framework and propose improvements for each step, paired with practical applications, to demonstrate the efficacy of my methods. This paper focuses on the healthcare, manufacturing and social-networking industries, but the materials are broad enough to have wide applications across data analytics generally. My main contributions can be summarized as follows: • In the big data analytics framework, raw data initially goes through a pre-processing step. Although many pre-processing techniques exist, there are several challenges in pre-processing text data and I develop a pre-processing tool for text data. • In the next step of the data analytics framework, there are challenges in both statistical modeling and anomaly detection o I address the research area of statistical modeling in two ways: - There are open challenges in defining models to characterize text data. I introduce a community extraction model that autonomously aggregates text documents into intuitive communities/groups - In health care, it is well established that social factors play a role in overall health outcomes however developing a statistical model that characterizes these relationships is an open research area. I developed statistical models for generalizing relationships between social determinants of health of a cohort and general medical risk factors o I address the research area of anomaly detection in two ways: - A variety of anomaly detection techniques exist already, however, some of these methods lack a rigorous statistical investigation thereby making them ineffective to a practitioner. I identify critical shortcomings to a proposed network based anomaly detection technique and introduce methodological improvements - Manufacturing enterprises which are now more connected than ever are vulnerably to anomalies in the form of cyber-physical attacks. I developed a sensor-based side-channel technique for anomaly detection in a manufacturing process / PHD / The prevalence of big data has rapidly changed the usage and mechanisms of data analytics within organizations. The fields of manufacturing and healthcare are two examples of industries that are currently undergoing significant transformations due to the rise of big data. The addition of large sensory systems is changing how parts are being manufactured and inspected and the prevalence of Health Information Technology (HIT) systems in healthcare systems is also changing the way healthcare services are delivered. These industries are turning to big data analytics in the hopes of acquiring many of the benefits other sectors are experiencing, including reducing cost, improving safety, and boosting productivity. However, there are many challenges that exist along with the framework of big data analytics, from pre-processing raw data, to statistical modeling of the data, and identifying anomalies present in the data or process. This work offers significant contributions in each of the aforementioned areas and includes practical real-world applications. Big data analytics generally follows this framework: first, a digitized process generates a stream of data, this raw data stream is pre-processed to convert the data into a usable format, the pre-processed data is analyzed using statistical tools. In this stage, called ‘statistical learning of the data’, analysts have two main objectives (1) develop a statistical model that captures the behavior of the process from a sample of the data (2) identify anomalies or outliers in the process. In this work, I investigate the different steps of the data analytics framework and propose improvements for each step, paired with practical applications, to demonstrate the efficacy of my methods. This work focuses on the healthcare and manufacturing industries, but the materials are broad enough to have wide applications across data analytics generally. My main contributions can be summarized as follows: • In the big data analytics framework, raw data initially goes through a pre-processing step. Although many pre-processing techniques exist, there are several challenges in pre-processing text data and I develop a pre-processing tool for text data. • In the next step of the data analytics framework, there are challenges in both statistical modeling and anomaly detection o I address the research area of statistical modeling in two ways: - There are open challenges in defining models to characterize text data. I introduce a community extraction model that autonomously aggregates text documents into intuitive communities/groups - In health care, it is well established that social factors play a role in overall health outcomes however developing a statistical model that characterizes these relationships is an open research area. I developed statistical models for generalizing relationships between social determinants of health of a cohort and general medical risk factors o I address the research area of anomaly detection in two ways: - A variety of anomaly detection techniques exist already, however, some of these methods lack a rigorous statistical investigation thereby making them ineffective to a practitioner. I identify critical shortcomings to a proposed network-based anomaly detection technique and introduce methodological improvements - Manufacturing enterprises which are now more connected than ever are vulnerable to anomalies in the form of cyber-physical attacks. I developed a sensor-based side-channel technique for anomaly detection in a manufacturing process. Advanced manufacturing Anomaly detection Cyber-physical attacks Electro-mechanical impedance Instrumented fixture Machine learning Social determinants of health
307	Anomaly Detection for Monocular Camera-based Distance Estimation in Autonomous Driving / Avvikelsedetektion för monokulär kamerabaserad distanssuppskattning vid autonom körning Ge, Muchen January 2024 (has links) With the development of Autonomous Driving (AD) technology, there is a growing concern over the safety of the technology. Finding methods to improve the reliability of this technology becomes a current challenge. The AD system is composed of a perception module, a planning module, and a control module. The perception module, which provides information about the environment for the whole system, is a critical part of the AD system. This project aims to provide a better understanding of the functionality and reliability of the perception module of an AD system. In this project, a simple model of the perception module is built with YOLOv5-nano for object detection, StrongSORT for object tracking, and MonoDepth2 for depth estimation. The system takes images from a single camera as input and produces a time series of distance to the preceding vehicle. Fault injection technologies are utilized for testing the reliability of the system. Different faults, including weather factors, sensor faults, and encoder faults, are injected. The system behaviors under faults are observed and analyzed. Then multiple methods for anomaly detection are applied to the time series of distance data, including the statistic method ARIMA, and the machine learning methods MLP and LSTM. Comparisons are made among the anomaly detection methods, based on the efficiency and performance. The dataset in this project is generated by the CARLA simulator. / Med utvecklingen av tekniken för autonom körning (AD) växer oro över teknologins säkerhet. Att hitta metoder för att förbättra tillförlitligheten hos denna teknologi blir en aktuell utmaning. AD-systemet består av en perceptionsmodul, en planeringsmodul och en styrmodul. Perceptionsmodulen, som tillhandahåller information om miljön för hela systemet, är en kritisk del av AD-systemet. Detta projekt syftar till att ge en bättre förståelse för funktionaliteten och tillförlitligheten hos perceptionsmodulen i ett AD-system. I detta projekt byggs en enkel modell av perceptionsmodulen med YOLOv5-nano för objektdetektion, StrongSORT för objektföljning och MonoDepth2 för djupuppskattning. Systemet tar bilder från en enda kamera som inmatning och producerar en tidsserie av avståndet till det föregående fordonet. Felinjektionstekniker används för att testa systemets tillförlitlighet. Olika fel, inklusive väderfaktorer, sensorfel och maskininlärningsfel, injiceras. Systemets beteende under fel observeras och analyseras. Därefter tillämpas flera metoder för avvikelsedetektering på tidsserien av avstånd, inklusive statistikmetoden ARIMA samt maskininlärningsmetoderna MLP och LSTM. Jämförelser görs mellan avvikelsedetekteringsmetoderna, baserat på effektivitet och prestanda. Datamängden i detta projekt genereras av CARLAsimulatorn. Autonomous Driving Fault Injection Anomaly Detection Distance Estimation Autonom Korning Felinjektion Avvikelsedetektion Distansuppskattning Computer and Information Sciences Data- och informationsvetenskap
308	Autoencoder-based anomaly detection in time series : Application to active medical devices Gietzelt, Marie January 2024 (has links) The aim of this thesis is to derive an unsupervised method for detecting anomalies in time series. Autoencoder-based approaches are widely used for the task of detecting anomalies where a model learns to reconstruct the pattern of the given data. The main idea is that the model will be good at reconstructing data that does not contain anomalous behavior. If the model fails to reconstruct an observation it will be marked as anomalous. In this thesis, the derived method is applied to data from active medical devices manufactured by B. Braun. The given data consist of 6,000 length-varying time series, where the average length is greater than 14,000. Hence, the given sample size is small compared to their lengths. Subsequences of the same pattern where anomalies are expected to appear can be extracted from the time series taking expert knowledge about the data into account. Considering the subsequences for the model training, the problem can betranslated into a problem with a large dataset of short time series. It is shown that a common autoencoder is able to reconstruct anomalies well and is therefore not useful to solve the task. It is demonstrated that a variational autoencoder works better as there are large differences between the given anomalous observations and their reconstructions. Furthermore, several thresholds for these differences are compared. The relative number of detected anomalies in the two given datasets are 3.12% and 5.03%. Variational Autoencoder Anomaly Detection Time Series Neural Networks Active Medical Devices Mathematics Matematik Probability Theory and Statistics Sannolikhetsteori och statistik
309	Cascaded Ensembling for Resource-Efficient Multivariate Time Series Anomaly Detection Mapitigama Boththanthrige, Dhanushki Pavithya January 2024 (has links) The rapid evolution of Connected and Autonomous Vehicles (CAVs) has led to a surge in research on efficient anomaly detection methods to ensure their safe and reliable operation. While state-of-the-art deep learning models offer promising results in this domain, their high computational requirements present challenges for deployment in resource-constrained environments, such as Electronic Control Units (ECU) in vehicles. In this context, we consider using the ensemble learning technique specifically the cascaded modeling approach for real-time and resource-efficient multivariate time series anomaly detection in CAVs. The study was done in collaboration with SCANIA, a transport solutions provider. The company is now undergoing a transformation of providing autonomous and sustainable solutions and this work will contribute towards that transformation. Our methodology employs unsupervised learning techniques to construct a cascade of models, comprising a coarse-grained model with lower computational complexity at level one, and a more intricate fine-grained model at level two. Furthermore, we incorporate cascaded model training to refine the complex model's ability to make decisions on uncertain and anomalous events, leveraging insights from the simpler model. Through extensive experimentation, we investigate the trade-off between model performance and computational complexity, demonstrating that our proposed cascaded model achieves greater efficiency with no performance degradation. Further, we do a comparative analysis of the impact of probabilistic versus deterministic approaches and assess the feasibility of model training at edge environments using the Federated Learning concept. Time Series Analysis Operational Anomaly Detection Unsupervised Learning Model Ensembling Resource-Efficient Inference Computer and Information Sciences Data- och informationsvetenskap
310	Machine learning for complex evaluation and detection of combustion health of Industrial Gas turbines Mshaleh, Mohammad January 2024 (has links) This study addresses the challenge of identifying anomalies within multivariate time series data, focusing specifically on the operational parameters of gas turbine combustion systems. In search of an effective detection method, the research explores the application of three distinct machine learning methods: the Long Short-Term Memory (LSTM) autoencoder, the Self-Organizing Map (SOM), and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Through the experiment, these models are evaluated to determine their efficacy in anomaly detection. The findings show that the LSTM autoencoder not only surpasses its counterparts in performance metrics but also shows a unique capability to identify the underlying causes of detected anomalies. This paper delves into the comparative analysis of these techniques and discusses the implications of the models in maintaining the reliability and safety of gas turbine operations. Anomaly detection Semi-supervised learning Multivariate time-series Combustion systems Ethical AI Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik

Search results