301 |
Deep Contrastive Metric Learning to Detect Polymicrogyria in Pediatric Brain MRIZhang, Lingfeng 28 November 2022 (has links)
Polymicrogyria (PMG) is one brain disease that mainly occurs in the pediatric brain. Heavy PMG will cause seizures, delayed development, and a series of problems. For this reason, it is critical to effectively identify PMG and start early treatment. Radiologists typically identify PMG through magnetic resonance imaging scans. In this study, we create and open a pediatric MRI dataset (named PPMR dataset) including PMG and controls from the Children's Hospital of Eastern Ontario (CHEO), Ottawa, Canada. The difference between PMG MRIs and control MRIs is subtle and the true distribution of the features of the disease is unknown. Hence, we propose a novel center-based deep contrastive metric learning loss function (named cDCM Loss) to deal with this difficult problem. Cross-entropy-based loss functions do not lead to models with good generalization on small and imbalanced dataset with partially known distributions. We conduct exhaustive experiments on a modified CIFAR-10 dataset to demonstrate the efficacy of our proposed loss function compared to cross-entropy-based loss functions and the state-of-the-art Deep SAD loss function. Additionally, based on our proposed loss function, we customize a deep learning model structure that integrates dilated convolution, squeeze-and-excitation blocks and feature fusion for our PPMR dataset, to achieve 92.01% recall. Since our suggested method is a computer-aided tool to assist radiologists in selecting potential PMG MRIs, 55.04% precision is acceptable. To our best knowledge, this research is the first to apply machine learning techniques to identify PMG only from MRI and our innovative method achieves better results than baseline methods.
|
302 |
PRAAG Algorithm in Anomaly DetectionZhang, Dongyang January 2016 (has links)
Anomaly detection has been one of the most important applications of datamining, widely applied in industries like financial, medical,telecommunication, even manufacturing. In many scenarios, data are in theform of streaming in a large amount, so it is preferred to analyze the datawithout storing all of them. In other words, the key is to improve the spaceefficiency of algorithms, for example, by extracting the statistical summary ofthe data. In this thesis, we study the PRAAG algorithm, a collective anomalydetection algorithm based on quantile feature of the data, so the spaceefficiency essentially depends on that of quantile algorithm.Firstly, the master thesis investigates quantile summary algorithms thatprovides quantile information of a dataset without storing all the data point.Then, we implement the selected algorithms and run experiments to test theperformance. Finally, the report focuses on experimenting on PRAAG tounderstand how the parameters affect the performance and compare it withother anomaly detection algorithms.In conclusion, GK algorithm provides a more space efficient way to estimatequantiles than simply storing all data points. Also, PRAAG is effective in termsof True Prediction Rate (TPR) and False Prediction Rate (FPR), comparingwith a baseline algorithm CUSUM. In addition, there are many possibleimprovements to be investigated, such as parallelizing the algorithm. / Att upptäcka avvikelser har varit en av de viktigaste tillämpningarna avdatautvinning (data mining). Det används stor utsträckning i branscher somfinans, medicin, telekommunikation, och även tillverkning. I många fallströmmas stora mängder data och då är det mest effektivt att analysera utanatt lagra data. Med andra ord är nyckeln att förbättra algoritmernasutrymmeseffektivitet till exempel genom att extraheraden statistiskasammanfattning avdatat. PRAAGär en kollektiv algoritm för att upptäckaavvikelser. Den ärbaserad på kvantilenegenskapernai datat, såutrymmeseffektiviteten beror i huvudsak på egenskapernahoskvantilalgoritmen.Examensarbetet undersöker kvantilsammanfattande algoritmer som gerkvantilinformationen av ett dataset utan att spara alla datapunkter. Vikommer fram till att GKalgoritmenuppfyllervåra krav. Sedan implementerarvialgoritmerna och genomför experiment för att testa prestandan. Slutligenfokuserar rapporten påexperiment på PRAAG för att förstå hur parametrarnapåverkar prestandan. Vi jämför även mot andra algoritmer för att upptäckaavvikelser.Sammanfattningsvis ger GK ett mer utrymmeseffektiv sätt att uppskattakvantiler än att lagra alla datapunkter. Dessutom är PRAAG, jämfört med enstandardalgoritm (CUSUM), effektiv när det gäller True Prediction Rate (TPR)och False Prediction Rate (FPR). Det finns fortfarande flertalet möjligaförbättringar som ska undersökas, t.ex. parallelisering av algoritmen.
|
303 |
Water Anomaly Detection Using Federated Machine LearningWallén, Melker, Böckin, Mauricio January 2021 (has links)
With the rapid increase of Internet of Things-devices(IoT), demand for new machine learning algorithms and modelshas risen. The focus of this project is implementing a federatedlearning (FL) algorithm to detect anomalies in measurementsmade by a water monitoring IoT-sensor. The FL algorithm trainsacross a collection of decentralized IoT-devices, each using thelocal data acquired from the specific sensor. The local machinelearning models are then uploaded to a mutual server andaggregated into a global model. The global model is sent back tothe sensors and is used as a template when training starts againlocally. In this project, we only have had access to one physicalsensor. This has forced us to virtually simulate sensors. Thesimulation was done by splitting the data gathered by the onlyexisting sensor. To deal with the long, sequential data gatheredby the sensor, a long short-term memory (LSTM) network wasused. This is a special type of artificial neural network (ANN)capable of learning long-term dependencies. After analyzing theobtained results it became clear that FL has the potential toproduce good results, provided that more physical sensors aredeployed. / I samband med den snabba ökningen avInternet of Things-enheter (IoT) har efterfrågan på nya algoritmeroch modeller för maskininlärning ökat. Detta projektfokuserar på att implementera en federated learning (FL) algoritmför att detektera avvikelser i mätdata från en sensorsom övervakar vattenkvaliteten. FL algoritmen tränar en samlingdecentraliserade IoT-enheter, var och en med hjälp av lokaldata från sensorn i fråga. De lokala maskininlärningsmodellernaladdas upp till en gemensam server och sammanställs till englobal modell. Den globala modellen skickas sedan tillbaka tillsensorerna och används som mall när den lokala träningen börjarigen. I det här projektet hade vi endast tillgång till en fysisksensor. Vi har därför varit tvungna att simulera sensorer. Dettagjordes genom att dela upp datamängden som samlats in frånden fysiska sensorn. För att hantera den långa sekventiella dataanvänds ett long short-term memory (LSTM) nätverk. Detta ären speciell typ av artificiellt neuronnät (ANN) som är kapabeltatt minnas mönster under en längre tid. Efter att ha analyseratresultaten blev det tydligt att FL har potentialen att produceragoda resultat, givet att fler fysiska sensorer implementeras. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
304 |
Anomaly Detection using LSTM N. Networks and Naive Bayes Classifiers in Multi-Variate Time-Series Data from a Bolt Tightening Tool / Anomali detektion med LSTM neuralt nätverk och Naive Bayes klassificerare av multivariabel tidsseriedata från en mutterdragareSelander, Karl-Filip January 2021 (has links)
In this thesis, an anomaly detection framework has been developed to aid in maintenance of tightening tools. The framework is built using LSTM networks and gaussian naive bayes classifiers. The suitability of LSTM networks for multi-variate sensor data and time-series prediction as a basis for anomaly detection has been explored. Current literature and research is mostly concerned with uni-variate data, where LSTM based approaches have had variable but often good results. However, most real world settings with sensor networks, such as the environment and tool from which this thesis data is gathered, are multi-variable. Thus, there is a need to research the effectiveness of the LSTM model in this setting. The thesis has emphasized the need of well defined evaluation metrics of anomaly detection approaches, the difficulties of defining anomalies and anomaly datasets, as well as illustrated the effectiveness of LSTM networks in multi-variate environments. / I den här uppsatsen har ett anomali detektions ramverk utvecklats för att bidra till underhållandet av åtdragarverktyg. Ramverket bygger på LSTM neurala nätverk och gaussian Naive Bayes klassificerare. Användbarheten av LSTM nätverk för multi-variabel data och tidsserie prediktion som basis för anomali detektion har undersökts. Nutida literatur och forskning berör mest envariabel data där LSTM baserade metoder ofta har presterat bra. Men, de flesta system i verkligheten är inte envariabel utan multivariabel, som den miljö verktyget, vars data undersöks i den här uppsatsen, opererar i. Därför anses det att det finns ett behov att undersöka användbarheten av LSTM modeller i den här typen av miljö. Det här arbetet har betonat vikten av väldefinierade utvärderingsvärden för anomali detektion, svårigheterna med att definiera anomalier och anomalidataset, samt illustrerat användbarheten av LSTM nätverk i multivariabla miljöer.
|
305 |
Jet Printing Quality ImprovementThrough Anomaly Detection UsingMachine Learning / Kvalitetsförbättring i jetprinting genom avvikelseidentifiering med maskinlärningLind, Henrik, Janssen, Jacob January 2021 (has links)
This case study examined emitted sound and actuated piezoelectric current in a solderpaste jet printing machine to conclude whether quality degradation could be detected with an autoencoder machine learning model. An autoencoder was used to detect anomalies in non-realtime that were defined asa diameter drift with an averaging window from a target diameter. A sensor and datacollection system existed for the piezoelectric current, and a microphone was proposedas a new sensor to monitor the system. The sound was preprocessed with a Fast Fourier Transform to extract information of the existing frequencies. The results of the model, visualized through reconstruction error plots and an Area Under the Curve score, show that the autoencoder successfully detected conspicuous anomalies. The study indicated that anomalies can be detected prior to solder paste supply failure using the sound. When the temperature was varied or when the jetting head nozzle was clogged by residual solder paste, the sound model identified most anomalies although the current network showed better performance. / Denna fallstudie undersökte emitterat ljud och drivande piezoelektrisk ström i en jetprinter med lödpasta för att dra slutsatsen om kvalitetsbrister kunde detekteras med en autoencoder maskininlärningsmodell. En autoencoder användes för att detektera avvikelser definierade som diametertrend med ett glidande medelvärde från en bördiameter. Tidigare studier har visat att den piezoelektriska strömmen i liknande maskiner kan användas för att förutspå kvalitetsbrister. En mikrofon föreslogs som en ny sensor för att övervaka systemet. Ljudet förbehandlades genom en snabb fouriertransform och frekvensinnehållet användes som indata i modellen. Resultaten visualiserades genom rekonstruktionsfel och metoden Area Under the Curve. Modellen upptäckte framgångsrikt tydliga avvikelser. För vissa felfall visade ljudet som indata bättre prestanda än strömmen, och för andra visade strömmen bättre prestanda. Till exempel indikerade studien att avvikelser kan detekteras före lodpasta-försörjningsfel med ljudet. Under varierande temperatur och då munstycket var igentäppt av kvarvarande lödpasta identifierade nätverket med ljud som indata de flesta avvikelser även om nätverket med strömmen visade bättre prestanda.
|
306 |
Anomaly Detection for Water Quality DataYAN, YAN January 2019 (has links)
Real-time water quality monitoring using automated systems with sensors is becoming increasingly common, which enables and demands timely identification of unexpected values. Technical issues create anomalies, which at the rate of incoming data can prevent the manual detection of problematic data.
This thesis deals with the problem of anomaly detection for water quality data using machine learning and statistic learning approaches. Anomalies in data can cause serious problems in posterior analysis and lead to poor decisions or incorrect conclusions. Five time series anomaly detection techniques: local outlier factor (machine learning), isolation forest (machine learning), robust random cut forest (machine learning), seasonal hybrid extreme studentized deviate (statistic learning approach), and exponential moving average (statistic learning approach) have been analyzed. Extensive experimental analysis of those techniques have been performed on data sets collected from sensors deployed in a wastewater treatment plant.
The results are very promising. In the experiments, three approaches successfully detected anomalies in the ammonia data set. With the temperature data set, the local outlier factor successfully detected all twenty-six outliers whereas the seasonal hybrid extreme studentized deviate only detected one anomaly point. The exponential moving average identified ten time ranges with anomalies. Eight of them cover a total of fourteen anomalies. The reproducible experiments demonstrate that local outlier factor is a feasible approach for detecting anomalies in water quality data. Isolation forest and robust random cut forest also rate high anomaly scores for the anomalies. The result of the primary experiment confirms that local outlier factor is much faster than isolation forest, robust random cut forest, seasonal hybrid extreme studentized deviate and exponential moving average. / Thesis / Master of Computer Science (MCS)
|
307 |
Data Analytics for Statistical LearningKomolafe, Tomilayo A. 05 February 2019 (has links)
The prevalence of big data has rapidly changed the usage and mechanisms of data analytics within organizations. Big data is a widely-used term without a clear definition. The difference between big data and traditional data can be characterized by four Vs: velocity (speed at which data is generated), volume (amount of data generated), variety (the data can take on different forms), and veracity (the data may be of poor/unknown quality). As many industries begin to recognize the value of big data, organizations try to capture it through means such as: side-channel data in a manufacturing operation, unstructured text-data reported by healthcare personnel, various demographic information of households from census surveys, and the range of communication data that define communities and social networks.
Big data analytics generally follows this framework: first, a digitized process generates a stream of data, this raw data stream is pre-processed to convert the data into a usable format, the pre-processed data is analyzed using statistical tools. In this stage, called statistical learning of the data, analysts have two main objectives (1) develop a statistical model that captures the behavior of the process from a sample of the data (2) identify anomalies in the process.
However, several open challenges still exist in this framework for big data analytics. Recently, data types such as free-text data are also being captured. Although many established processing techniques exist for other data types, free-text data comes from a wide range of individuals and is subject to syntax, grammar, language, and colloquialisms that require substantially different processing approaches. Once the data is processed, open challenges still exist in the statistical learning step of understanding the data.
Statistical learning aims to satisfy two objectives, (1) develop a model that highlights general patterns in the data (2) create a signaling mechanism to identify if outliers are present in the data. Statistical modeling is widely utilized as researchers have created a variety of statistical models to explain everyday phenomena such as predicting energy usage behavior, traffic patterns, and stock market behaviors, among others. However, new applications of big data with increasingly varied designs present interesting challenges. Consider the example of free-text analysis posed above. There's a renewed interest in modeling free-text narratives from sources such as online reviews, customer complaints, or patient safety event reports, into intuitive themes or topics. As previously mentioned, documents describing the same phenomena can vary widely in their word usage and structure.
Another recent interest area of statistical learning is using the environmental conditions that people live, work, and grow in, to infer their quality of life. It is well established that social factors play a role in overall health outcomes, however, clinical applications of these social determinants of health is a recent and an open problem. These examples are just a few of many examples wherein new applications of big data pose complex challenges requiring thoughtful and inventive approaches to processing, analyzing, and modeling data.
Although a large body of research exists in the area of anomaly detection increasingly complicated data sources (such as side-channel related data or network-based data) present equally convoluted challenges. For effective anomaly-detection, analysts define parameters and rules, so that when large collections of raw data are aggregated, pieces of data that do not conform are easily noticed and flagged.
In this work, I investigate the different steps of the data analytics framework and propose improvements for each step, paired with practical applications, to demonstrate the efficacy of my methods. This paper focuses on the healthcare, manufacturing and social-networking industries, but the materials are broad enough to have wide applications across data analytics generally. My main contributions can be summarized as follows:
• In the big data analytics framework, raw data initially goes through a pre-processing step. Although many pre-processing techniques exist, there are several challenges in pre-processing text data and I develop a pre-processing tool for text data.
• In the next step of the data analytics framework, there are challenges in both statistical modeling and anomaly detection
o I address the research area of statistical modeling in two ways:
- There are open challenges in defining models to characterize text data. I introduce a community extraction model that autonomously aggregates text documents into intuitive communities/groups
- In health care, it is well established that social factors play a role in overall health outcomes however developing a statistical model that characterizes these relationships is an open research area. I developed statistical models for generalizing relationships between social determinants of health of a cohort and general medical risk factors
o I address the research area of anomaly detection in two ways:
- A variety of anomaly detection techniques exist already, however, some of these methods lack a rigorous statistical investigation thereby making them ineffective to a practitioner. I identify critical shortcomings to a proposed network based anomaly detection technique and introduce methodological improvements
- Manufacturing enterprises which are now more connected than ever are vulnerably to anomalies in the form of cyber-physical attacks. I developed a sensor-based side-channel technique for anomaly detection in a manufacturing process / PHD / The prevalence of big data has rapidly changed the usage and mechanisms of data analytics within organizations. The fields of manufacturing and healthcare are two examples of industries that are currently undergoing significant transformations due to the rise of big data. The addition of large sensory systems is changing how parts are being manufactured and inspected and the prevalence of Health Information Technology (HIT) systems in healthcare systems is also changing the way healthcare services are delivered. These industries are turning to big data analytics in the hopes of acquiring many of the benefits other sectors are experiencing, including reducing cost, improving safety, and boosting productivity. However, there are many challenges that exist along with the framework of big data analytics, from pre-processing raw data, to statistical modeling of the data, and identifying anomalies present in the data or process. This work offers significant contributions in each of the aforementioned areas and includes practical real-world applications.
Big data analytics generally follows this framework: first, a digitized process generates a stream of data, this raw data stream is pre-processed to convert the data into a usable format, the pre-processed data is analyzed using statistical tools. In this stage, called ‘statistical learning of the data’, analysts have two main objectives (1) develop a statistical model that captures the behavior of the process from a sample of the data (2) identify anomalies or outliers in the process.
In this work, I investigate the different steps of the data analytics framework and propose improvements for each step, paired with practical applications, to demonstrate the efficacy of my methods. This work focuses on the healthcare and manufacturing industries, but the materials are broad enough to have wide applications across data analytics generally. My main contributions can be summarized as follows:
• In the big data analytics framework, raw data initially goes through a pre-processing step. Although many pre-processing techniques exist, there are several challenges in pre-processing text data and I develop a pre-processing tool for text data.
• In the next step of the data analytics framework, there are challenges in both statistical modeling and anomaly detection
o I address the research area of statistical modeling in two ways:
- There are open challenges in defining models to characterize text data. I introduce a community extraction model that autonomously aggregates text documents into intuitive communities/groups
- In health care, it is well established that social factors play a role in overall health outcomes however developing a statistical model that characterizes these relationships is an open research area. I developed statistical models for generalizing relationships between social determinants of health of a cohort and general medical risk factors
o I address the research area of anomaly detection in two ways:
- A variety of anomaly detection techniques exist already, however, some of these methods lack a rigorous statistical investigation thereby making them ineffective to a practitioner. I identify critical shortcomings to a proposed network-based anomaly detection technique and introduce methodological improvements
- Manufacturing enterprises which are now more connected than ever are vulnerable to anomalies in the form of cyber-physical attacks. I developed a sensor-based side-channel technique for anomaly detection in a manufacturing process.
|
308 |
Anomaly Detection for Monocular Camera-based Distance Estimation in Autonomous Driving / Avvikelsedetektion för monokulär kamerabaserad distanssuppskattning vid autonom körningGe, Muchen January 2024 (has links)
With the development of Autonomous Driving (AD) technology, there is a growing concern over the safety of the technology. Finding methods to improve the reliability of this technology becomes a current challenge. The AD system is composed of a perception module, a planning module, and a control module. The perception module, which provides information about the environment for the whole system, is a critical part of the AD system. This project aims to provide a better understanding of the functionality and reliability of the perception module of an AD system. In this project, a simple model of the perception module is built with YOLOv5-nano for object detection, StrongSORT for object tracking, and MonoDepth2 for depth estimation. The system takes images from a single camera as input and produces a time series of distance to the preceding vehicle. Fault injection technologies are utilized for testing the reliability of the system. Different faults, including weather factors, sensor faults, and encoder faults, are injected. The system behaviors under faults are observed and analyzed. Then multiple methods for anomaly detection are applied to the time series of distance data, including the statistic method ARIMA, and the machine learning methods MLP and LSTM. Comparisons are made among the anomaly detection methods, based on the efficiency and performance. The dataset in this project is generated by the CARLA simulator. / Med utvecklingen av tekniken för autonom körning (AD) växer oro över teknologins säkerhet. Att hitta metoder för att förbättra tillförlitligheten hos denna teknologi blir en aktuell utmaning. AD-systemet består av en perceptionsmodul, en planeringsmodul och en styrmodul. Perceptionsmodulen, som tillhandahåller information om miljön för hela systemet, är en kritisk del av AD-systemet. Detta projekt syftar till att ge en bättre förståelse för funktionaliteten och tillförlitligheten hos perceptionsmodulen i ett AD-system. I detta projekt byggs en enkel modell av perceptionsmodulen med YOLOv5-nano för objektdetektion, StrongSORT för objektföljning och MonoDepth2 för djupuppskattning. Systemet tar bilder från en enda kamera som inmatning och producerar en tidsserie av avståndet till det föregående fordonet. Felinjektionstekniker används för att testa systemets tillförlitlighet. Olika fel, inklusive väderfaktorer, sensorfel och maskininlärningsfel, injiceras. Systemets beteende under fel observeras och analyseras. Därefter tillämpas flera metoder för avvikelsedetektering på tidsserien av avstånd, inklusive statistikmetoden ARIMA samt maskininlärningsmetoderna MLP och LSTM. Jämförelser görs mellan avvikelsedetekteringsmetoderna, baserat på effektivitet och prestanda. Datamängden i detta projekt genereras av CARLAsimulatorn.
|
309 |
Machine learning for complex evaluation and detection of combustion health of Industrial Gas turbinesMshaleh, Mohammad January 2024 (has links)
This study addresses the challenge of identifying anomalies within multivariate time series data, focusing specifically on the operational parameters of gas turbine combustion systems. In search of an effective detection method, the research explores the application of three distinct machine learning methods: the Long Short-Term Memory (LSTM) autoencoder, the Self-Organizing Map (SOM), and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Through the experiment, these models are evaluated to determine their efficacy in anomaly detection. The findings show that the LSTM autoencoder not only surpasses its counterparts in performance metrics but also shows a unique capability to identify the underlying causes of detected anomalies. This paper delves into the comparative analysis of these techniques and discusses the implications of the models in maintaining the reliability and safety of gas turbine operations.
|
310 |
Anomaly Detection In Heterogeneous IoT Systems: Leveraging Symbolic Encoding Of Performance Metrics For Anomaly ClassificationPatel, Maanav 01 June 2024 (has links) (PDF)
Anomaly detection in Internet of Things (IoT) systems has become an increasingly popular field of research as the number of IoT devices proliferate year over year. Recent research often relies on machine learning algorithms to classify sensor readings directly. However, this approach leads to solutions being non-portable and unable to be applied to varying IoT platform infrastructure, as they are trained with sensor data specific to one configuration. Moreover, sensors generate varying amounts of non-standard data which complicates model training and limits generalization. This research focuses on addressing these problems in three ways a) the creation of an IoT Testbed which is configurable and parameterizable for dataset generation, b) the usage of system performance metrics as the dataset for training the anomaly classifier which ensures a fixed dataset size, and c) the application of Symbolic Aggregate Approximation (SAX) to encode patterns in system performance metrics which allows our trained Long Short-Term Memory (LSTM) model to classify anomalies agnostic to the underlying system configuration. Our devised IoT Testbed provides a lightweight setup for data generation which directly reflects some of the most pertinent components of Industry 4.0 pipelines including a MQTT Broker, Apache Kafka, and Apache Cassandra. Additionally, our proposed solution provides improved portability over state-of-the-art models while standardizing the required training data. Results demonstrate the effectiveness of utilizing symbolized performance metrics as we were able to achieve accuracies of 95.87%, 87.33%, and 87.47% for three different IoT system configurations. The latter two accuracies represent the model’s ability to be generalized to datasets generated from differing system configurations.
|
Page generated in 0.4636 seconds