Spelling suggestions: "subject:" anomaly detection"" "subject:" unomaly detection""
291 |
CURVILINEAR STRUCTURE DETECTION IN IMAGES BY CONNECTED-TUBE MARKED POINT PROCESS AND ANOMALY DETECTION IN TIME SERIESTianyu Li (15349048) 26 April 2023 (has links)
<p><em>Curvilinear structure detection in images has been investigated for decades. In general, the detection of curvilinear structures includes two aspects, binary segmentation of the image and inference of the graph representation of the curvilinear network. In our work, we propose a connected-tube model based on a marked point process (MPP) for addressing the two issues. The proposed tube model is applied to fiber detection in microscopy images by combining connected-tube and ellipse models. Moreover, a tube-based segmentation algorithm has been proposed to improve the segmentation accuracy. Experiments on fiber-reinforced polymer images, satellite images, and retinal vessel images will be presented. Additionally, we extend the 2D tube model to a 3D tube model, with each tube be modeled as a cylinder. To investigate the supervised curvilinear structure detection method, we focus on the application of road detection in satellite images and propose a two-stage learning strategy for road segmentation. A probability map is generated in the first stage by a selected neural network, then we attach the probability map image to the original RGB images and feed the resulting four images to a U-Net-like network in the second stage to get a refined result.</em></p>
<p><br></p>
<p><em>Anomaly detection in time series is a key step in diagnosing abnormal behavior in some systems. Long Short-Term Memory networks (LSTMs) have been demonstrated to be useful for anomaly detection in time series, due to their predictive power. However, for a system with thousands of different time sequences, a single LSTM predictor may not perform well for all the sequences. To enhance adaptability, we propose a stacked predictor framework. Also, we propose a novel dynamic thresholding algorithm based on the prediction errors to extract the potential anomalies. To further improve the accuracy of anomaly detection, we propose a post-detection verification method based on a fast and accurate time series subsequence matching algorithm.</em></p>
<p><br></p>
<p><em>To detect anomalies from multi-channel time series, a bi-directional transformer-based predictor is applied to generate the prediction error sequences, and a statistical model referred as an anomaly marked point process (Anomaly-MPP) is proposed to extract the anomalies from the error sequences. The effectiveness of our methods is demonstrated by testing on a variety of time series datasets.</em></p>
|
292 |
Root-cause analysis with data-driven methods and machine learning in lithium-ion battery tests : Master's thesis about detecting deviations with PCARademacher, Frans January 2022 (has links)
The increased demand of energy storage systems and electric vehicles on the market result in high demand of lithium-ion batteries. As a lithium-ion battery manufacturer, Northvolt runs quality tests on the products to assess their performance, life and safety. Batteries that are tested are most often behaving as expected, but sometimes deviations occur. Anomaly detection is today most often performed by plotting and comparing produced data to other test-data to find which parameters that are deviating. The purpose of this thesis is to automatize anomaly detection and a proposed solution is to use state-of-the-art machine learning methods. These include using supervised and unsupervised machine learning. Before applying machine learning, the feature engineering is presented. It describes what parameters are extracted from the experiment data sets. Then the supervised machine learning framework is described. For the unsupervised machine learning, a principal component analysis is presented to locate deviations. This thesis also presents a differential capacity analysis, as this could be incorporated with the features in the future. The results shows that the subset of labeled data for supervised learning is too small to produce a model that predicts future deviations. The extracted features are also used in the principal component analysis, where the results show deviations (outliers) and aid targeting the anomalies. These can then be used to determine the root-cause of particular anomalies and mitigate future deviations.
|
293 |
Fraud Detection on Unlabeled Data with Unsupervised Machine Learning / Bedrägeridetektering på omärkt data med oövervakad maskininlärningRenström, Martin, Holmsten, Timothy January 2018 (has links)
A common problem in systems handling user interaction was the risk for fraudulent behaviour. As an example, in a system with credit card transactions it could have been a person using a another user's account for purchases, or in a system with advertisment it could be bots clicking on ads. These malicious attacks were often disguised as normal interactions and could be difficult to detect. It was especially challenging when working with datasets that did not contain so called labels, which showed if the data point was fraudulent or not. This meant that there were no data that had previously been classified as fraud, which in turn made it difficult to develop an algorithm that could distinguish between normal and fraudulent behavior. In this thesis, the area of anomaly detection was explored with the intent of detecting fraudulent behavior without labeled data. Three neural network based prototypes were developed in this study. All three prototypes were some sort of variation of autoencoders. The first prototype which served as a baseline was a simple three layer autoencoder, the second prototype was a novel autoencoder which was called stacked autoencoder, the third prototype was a variational autoencoder. The prototypes were then trained and evaluated on two different datasets which both contained non fraudulent and fraudulent data. In this study it was found that the proposed stacked autoencoder architecture achieved better performance scores in recall, accuracy and NPV in the tests that were designed to simulate a real world scenario. / Ett vanligt problem med användares interaktioner i ett system var risken för bedrägeri. För ett system som hanterarade dataset med kreditkortstransaktioner så kunde ett exempel vara att en person använde en annans identitet för kortköp, eller i system som hanterade reklam så skulle det kunna ha varit en automatiserad mjukvara som simulerade interaktioner. Dessa attacker var ofta maskerade som normala interaktioner och kunde därmed vara svåra att upptäcka. Inom dataset som inte har korrekt märkt data så skulle det vara speciellt svårt att utveckla en algoritm som kan skilja på om interaktionen var avvikande eller inte. I denna avhandling så utforskas ämnet att upptäcka anomalier i dataset utan specifik data som tyder på att det var bedrägeri. Tre prototyper av neurala nätverk användes i denna studie som tränades och utvärderades på två dataset som innehöll både data som sade att det var bedrägeri och inte bedrägeri. Den första prototypen som fungerade som en bas var en simpel autoencoder med tre lager, den andra prototypen var en ny autoencoder som har fått namnet staplad autoencoder och den tredje prototypen var en variationell autoencoder. För denna studie så gav den föreslagna staplade autoencodern bäst resultat för återkallelse, noggrannhet och NPV i de test som var designade att efterlikna ett verkligt scenario.
|
294 |
Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)Singh, Akash January 2017 (has links)
We explore the use of Long short-term memory (LSTM) for anomaly detection in temporal data. Due to the challenges in obtaining labeled anomaly datasets, an unsupervised approach is employed. We train recurrent neural networks (RNNs) with LSTM units to learn the normal time series patterns and predict future values. The resulting prediction errors are modeled to give anomaly scores. We investigate different ways of maintaining LSTM state, and the effect of using a fixed number of time steps on LSTM prediction and detection performance. LSTMs are also compared to feed-forward neural networks with fixed size time windows over inputs. Our experiments, with three real-world datasets, show that while LSTM RNNs are suitable for general purpose time series modeling and anomaly detection, maintaining LSTM state is crucial for getting desired results. Moreover, LSTMs may not be required at all for simple time series. / Vi undersöker Long short-term memory (LSTM) för avvikelsedetektion i tidsseriedata. På grund av svårigheterna i att hitta data med etiketter så har ett oövervakat an-greppssätt använts. Vi tränar rekursiva neuronnät (RNN) med LSTM-noder för att lära modellen det normala tidsseriemönstret och prediktera framtida värden. Vi undersö-ker olika sätt av att behålla LSTM-tillståndet och effekter av att använda ett konstant antal tidssteg på LSTM-prediktionen och avvikelsedetektionsprestandan. LSTM är också jämförda med vanliga neuronnät med fasta tidsfönster över indata. Våra experiment med tre verkliga datasetvisar att även om LSTM RNN är tillämpbara för generell tidsseriemodellering och avvikelsedetektion så är det avgörande att behålla LSTM-tillståndet för att få de önskaderesultaten. Dessutom är det inte nödvändigt att använda LSTM för enkla tidsserier.
|
295 |
Anomaly Detection in Microservice Infrastructures / Anomalitetsdetektering i microservice-infrastrukturerOhlsson, Jonathan January 2018 (has links)
Anomaly detection in time series is a broad field with many application areas, and has been researched for many years. In recent years the need for monitoring and DevOps has increased, partly due to the increased usage of microservice infrastructures. Applying time series anomaly detection to the metrics emitted by these microservices can yield new insights into the system health and could enable detecting anomalous conditions before they are escalated into a full incident. This thesis investigates how two proposed anomaly detectors, one based on the RPCA algorithm and the other on the HTM neural network, perform on metrics emitted by a microservice infrastructure, with the goal of enhancing the infrastructure monitoring. The detectors are evaluated against a random sample of metrics from a digital rights management company’s microservice infrastructure, as well as the open source NAB dataset. It is illustrated that both algorithms are able to detect every known incident in the company metrics tested. Their ability to detect anomalies is shown to be dependent on the defined threshold value for what qualifies as an outlier. The RPCA Detector proved to be better at detecting anomalies on the company microservice metrics, however the HTM detector performed better on the NAB dataset. Findings also highlight the difficulty of manually annotating anomalies even with domain knowledge. An issue found to be true for both the dataset created for this project, and the NAB dataset. The thesis concludes that the proposed detectors possess different abilities, both having their respective trade-offs. Although they are similar in detection accuracy and false positive rates, each has different inert abilities to perform tasks such as continuous monitoring or ease of deployment in an existing monitoring setup. / Anomalitetsdetektering i tidsserier är ett brett område med många användningsområden och har undersökts under många år. De senaste åren har behovet av övervakning och DevOps ökat, delvis på grund av ökad användning av microservice-infrastrukturer. Att tillämpa tidsserieanomalitetsdetektering på de mätvärden som emitteras av dessa microservices kan ge nya insikter i systemhälsan och kan möjliggöra detektering av avvikande förhållanden innan de eskaleras till en fullständig incident. Denna avhandling undersöker hur två föreslagna anomalitetsdetektorer, en baserad på RPCA-algoritmen och den andra på HTM neurala nätverk, presterar på mätvärden som emitteras av en microservice-infrastruktur, med målet att förbättra infrastrukturövervakningen. Detektorerna utvärderas mot ett slumpmässigt urval av mätvärden från en microservice-infrastruktur på en digital underhållningstjänst, och från det öppet tillgängliga NAB-dataset. Det illustreras att båda algoritmerna kunde upptäcka alla kända incidenter i de testade underhållningstjänst-mätvärdena. Deras förmåga att upptäcka avvikelser visar sig vara beroende av det definierade tröskelvärdet för vad som kvalificeras som en anomali. RPCA-detektorn visade sig bättre på att upptäcka anomalier i underhållningstjänstens mätvärden, men HTM-detektorn presterade bättre på NAB-datasetet. Fynden markerar också svårigheten med att manuellt annotera avvikelser, även med domänkunskaper. Ett problem som visat sig vara sant för datasetet skapat för detta projekt och NAB-datasetet. Avhandlingen slutleder att de föreslagna detektorerna har olikaförmågor, vilka båda har sina respektive avvägningar. De har liknande detekteringsnoggrannhet, men har olika inerta förmågor för att utföra uppgifter som kontinuerlig övervakning, eller enkelhet att installera i en befintlig övervakningsinstallation.
|
296 |
Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-type AttributesKoufakou, Anna 01 January 2009 (has links)
An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates.
|
297 |
Semi-supervised anomaly detection in mask writer servo logs : An investigation of semi-supervised deep learning approaches for anomaly detection in servo logs of photomask writers / Semiövervakad anomalidetektion i maskritares servologgar : En undersökning av semi-övervakade djupinlärningsmetoder för anomalidetektion i servologgar av fotomaskritareLiiv, Toomas January 2023 (has links)
Semi-supervised anomaly detection is the setting, where in addition to a set of nominal samples, predominantly normal, a small set of labeled anomalies is available at training. In contrast to supervised defect classification, these methods do not learn the anomaly class directly and should have better generalization capability as new kinds of anomalies are introduced at test time. This is applied in an industrial defect detection context in the logs of photomask writers. Four methods are compared: two semi-supervised one-class anomaly detection methods: Deep Semi-Supervised Anomaly Detection (DeepSAD), hypersphere classifier (HSC) and two baselines, a reconstructive GAN method based on the Dual Autoencoder GAN (DAGAN) and a non-learned distance method based on the Kullback-Leibler divergence. Results show that semi-supervision increases performance, as measured by ROC AUC and PRO AUC, of DeepSAD and HSC, but at the tested supervision levels, do not surpass the performance of DAGAN. Furthermore, it is found that autoencoder pretraining increases performance of HSC similarly to as it does for DeepSAD, even though only the latter is recommended in literature. Lastly, soft labels are utilized for HSC, but results show that this has no or negative effect on the performance. / Inom semiövervakad anomalidetektion finns det förutom en mängd nominella datapunkter (huvudsakligen normala), även en liten mängd märkta anomalier tillgängliga vid träning. I motsats till övervakad defektklassifikation lär sig dessa metoder inte att känna igen anomaliklassen direkt och bör ha större generaliseringsförmåga när nya sorters anomalier introduceras vid testning. Detta appliceras inom industriell defektdetektion i loggarna för fotomaskritare. Fyra metoder jämförs: Djup Semiövervakad Anomalidetektion (DeepSAD), hypersfärklassificerare (HSC) och två basnivåer, en rekonstruktiv GAN-metod baserad på Dual Autoencoder GAN (DAGAN) och en ickke-lärd avståndsmetod baserad på Kullback-Leibler-divergens. Resultaten visar att semiöervakning förbättrar prestationen, mätt med hjälp av ROC AUC och PRO AUC, för DeepSAD och HSC. Däremot överträffar det inte, för de testade övervakningsnivåerna, prestationen för DAGAN. Vidare kan det ses att autokodningsförträning förbättrar prestationen för HSC på ett liknande sätt som det gör för DeepSAD, trots att bara det senare rekommenderas i litteraturen. Slutligen används mjuka märkningar för HSC, men resultaten visar att detta har liten eller till och med negativ påverkan på resultatet.
|
298 |
Robot Proficiency Self-Assessment Using Assumption-Alignment TrackingCao, Xuan 01 April 2024 (has links) (PDF)
A robot is proficient if its performance for its task(s) satisfies a specific standard. While the design of autonomous robots often emphasizes such proficiency, another important attribute of autonomous robot systems is their ability to evaluate their own proficiency. A robot should be able to conduct proficiency self-assessment (PSA), i.e. assess how well it can perform a task before, during, and after it has attempted the task. We propose the assumption-alignment tracking (AAT) method, which provides time-indexed assessments of the veracity of robot generators' assumptions, for designing autonomous robots that can effectively evaluate their own performance. AAT can be considered as a general framework for using robot sensory data to extract useful features, which are then used to build data-driven PSA models. We develop various AAT-based data-driven approaches to PSA from different perspectives. First, we use AAT for estimating robot performance. AAT features encode how the robot's current running condition varies from the normal condition, which correlates with the deviation level between the robot's current performance and normal performance. We use the k-nearest neighbor algorithm to model that correlation. Second, AAT features are used for anomaly detection. We treat anomaly detection as a one-class classification problem where only data from the robot operating in normal conditions are used in training, decreasing the burden on acquiring data in various abnormal conditions. The cluster boundary of data points from normal conditions, which serves as the decision boundary between normal and abnormal conditions, can be identified by mainstream one-class classification algorithms. Third, we improve PSA models that predict robot success/failure by introducing meta-PSA models that assess the correctness of PSA models. The probability that a PSA model's prediction is correct is conditioned on four features: 1) the mean distance from a test sample to its nearest neighbors in the training set; 2) the predicted probability of success made by the PSA model; 3) the ratio between the robot's current performance and its performance standard; and 4) the percentage of the task the robot has already completed. Meta-PSA models trained on the four features using a Random Forest algorithm improve PSA models with respect to both discriminability and calibration. Finally, we explore how AAT can be used to generate a new type of explanation of robot behavior/policy from the perspective of a robot's proficiency. AAT provides three pieces of information for explanation generation: (1) veracity assessment of the assumptions on which the robot's generators rely; (2) proficiency assessment measured by the probability that the robot will successfully accomplish its task; and (3) counterfactual proficiency assessment computed with the veracity of some assumptions varied hypothetically. The information provided by AAT fits the situation awareness-based framework for explainable artificial intelligence. The efficacy of AAT is comprehensively evaluated using robot systems with a variety of robot types, generators, hardware, and tasks, including a simulated robot navigating in a maze-based (discrete time) Markov chain environment, a simulated robot navigating in a continuous environment, and both a simulated and a real-world robot arranging blocks of different shapes and colors in a specific order on a table.
|
299 |
Adaptive detection of anomalies in fuel system of Saab 39 Gripen using machine learning : Investigating methods to improve anomaly detection of selected signals in the fuel system of Gripen E.Olof, Ahlgren Bergström January 2022 (has links)
The process of flying fighter jets naturally comes with tough environments and manoeu-vres where temperatures, pressures and forces all have a large impact on the aircraft. Part degeneration and general wear and tear greatly affects functionalities of the aircraft, and it is of importance to carefully monitor the well being of an aircraft in order to avoid catastrophic accidents. Therefore, this project aims to investigate various ways to improve anomaly detection of selected signals in the Gripen E fuel system. The methodology in this project was to compare collected flight data with generated data of a simulation model. The method was conducted for three selected signals with different properties, namely the transfer pump outlet pressure and flow, as well as the fuel mass in tank 2. A neural network was trained to generate predictions of the residual between measured and simulated flight data, together with a RandomForestRegressor to create a confidence interval of said signal. This made it possible to detect signal abnormalities when the gathered flight data heavily deviated from the generated machine learning algorithm predictions, thus alarming for anomalies. Investigated methods to improve anomaly detection includes feature selection, adding ar-tificial signals to facilitate machine learning algorithm training and filtering. A large part was also to see how an improved simulation model, and thus more accurate simulation data would affect the anomaly detection. A lot of effort was put into improving the simulation model, and investigating this area. In addition to this, the data balancing and features to balance the data on was revised. A significant challenge to tackle in this project was to map the modelling difficulties due to differences in signal properties. A by-productof improving the anomaly detection was that a general method was obtained to create a anomaly detection model of an arbitrarily chosen signal in the fuel system, regardless of the signal properties. Results show that the anomaly detection model was improved, with the main improvement area shown to be the choice of features. Improving the simulation model did not improve the anomaly detection in the transfer pump outlet pressure and flow, but it did however slightly facilitate anomaly detection of the fuel mass in tank 2 signal. It is also concluded that the signal properties can greatly affect the anomaly detection models, as accumulated effects in a signal can complicate anomaly detection. Remaining improvement areas such as filtering and addition of artificial signals can be helpful but needs to be looked into for each signal. It was also concluded that a stochastic behaviour was seen in the data balancing process, that could skew results if not handled properly. Over all the three selected signals, only one flight was misclassified as an anomaly, which can be seen as great results.
|
300 |
Coronary Artery Plaque Segmentation with CTA Images Based on Deep Learning / Segmentering baserad på djupinlärning i CTA-bilder av plack i kransartärerShuli, Zhang January 2022 (has links)
Atherosclerotic plaque is currently the leading cause of coronary artery disease (CAD). With the help of CT images, we can identify the size and type of plaque, which can help doctors make a correct diagnosis. To do this, we need to segment coronary plaques from CT images. However, plaque segmentation is still challenging because it takes a lot of energy and time of the radiologists. With the development of technology, some segmentation algorithms based on deep learning are applied in this field. These deep learning algorithms tend to be fully automated and have high segmentation accuracy, showing great potential. In this paper, we try to use deep learning method to segment plaques from 3D cardiac CT images. This work is implemented in two steps. The first part is to extract coronary artery from the CT image with the help of UNet. In the second part, a fully convolutional network is used to segment the plaques from the artery. In each part, the algorithm undergoes 5-fold cross validation. In the first part, we achieve a dice coefficient of 0.8954. In the second part, we achieve the AUC score of 0.9202 which is higher than auto-encoder method and is very close to state-of-the-art method. / Aterosklerotisk plack är för närvarande den främsta orsaken till kranskärlssjukdom (CAD). Med hjälp av CT-bilder kan vi identifiera storlek och typ av plack, vilket kan hjälpa läkare att ställa en korrekt diagnos. För att göra detta måste vi segmentera koronarplack från CT-bilder. Emellertid är placksegmentering fortfarande utmanande eftersom det tar mycket energi och tid av radiologerna. Med utvecklingen av teknik tillämpas vissa segmenteringsalgoritmer baserade på djupinlärning inom detta område. Dessa djupinlärningsalgoritmer tenderar att vara helt automatiserade och har hög segmenteringsnoggrannhet, vilket visar stor potential. I detta dokument försöker vi använda djupinlärningsmetoden för att segmentera plack från 3D-hjärt-CT-bilder. Detta arbete genomförs i två steg. Den första delen är att extrahera kranskärlen från CT-bilden med hjälp av UNet. I den andra delen används ett helt konvolutionerande nätverk för att segmentera placken från artären. I varje del genomgår algoritmen 5-faldig korsvalidering. I den första delen uppnår vi en tärningskoefficient på 0,8954. I den andra delen uppnår vi AUC-poängen 0,9202, vilket är högre än den automatiska kodarmetoden och är mycket nära den senaste metoden.
|
Page generated in 0.2529 seconds