Global ETD Search

91	Scalable Validation of Data Streams Xu, Cheng January 2016 (has links) In manufacturing industries, sensors are often installed on industrial equipment generating high volumes of data in real-time. For shortening the machine downtime and reducing maintenance costs, it is critical to analyze efficiently this kind of streams in order to detect abnormal behavior of equipment. For validating data streams to detect anomalies, a data stream management system called SVALI is developed. Based on requirements by the application domain, different stream window semantics are explored and an extensible set of window forming functions are implemented, where dynamic registration of window aggregations allow incremental evaluation of aggregate functions over windows. To facilitate stream validation on a high level, the system provides two second order system validation functions, model-and-validate and learn-and-validate. Model-and-validate allows the user to define mathematical models based on physical properties of the monitored equipment, while learn-and-validate builds statistical models by sampling the stream in real-time as it flows. To validate geographically distributed equipment with short response time, SVALI is a distributed system where many SVALI instances can be started and run in parallel on-board the equipment. Central analyses are made at a monitoring center where streams of detected anomalies are combined and analyzed on a cluster computer. SVALI is an extensible system where functions can be implemented using external libraries written in C, Java, and Python without any modifications of the original code. The system and the developed functionality have been applied on several applications, both industrial and for sports analytics. Data Stream Management Distributed Data Stream Processing Data Stream Validation Anomaly Detection
92	Detecting anomalies in multivariate time series from automotive systems Theissler, Andreas January 2013 (has links) In the automotive industry test drives are conducted during the development of new vehicle models or as a part of quality assurance for series vehicles. During the test drives, data is recorded for the use of fault analysis resulting in millions of data points. Since multiple vehicles are tested in parallel, the amount of data that is to be analysed is tremendous. Hence, manually analysing each recording is not feasible. Furthermore the complexity of vehicles is ever-increasing leading to an increase of the data volume and complexity of the recordings. Only by effective means of analysing the recordings, one can make sure that the effort put in the conducting of test drives pays off. Consequently, effective means of test drive analysis can become a competitive advantage. This Thesis researches ways to detect unknown or unmodelled faults in recordings from test drives with the following two aims: (1) in a data base of recordings, the expert shall be pointed to potential errors by reporting anomalies, and (2) the time required for the manual analysis of one recording shall be shortened. The idea to achieve the first aim is to learn the normal behaviour from a training set of recordings and then to autonomously detect anomalies. The one-class classifier “support vector data description” (SVDD) is identified to be most suitable, though it suffers from the need to specify parameters beforehand. One main contribution of this Thesis is a new autonomous parameter tuning approach, making SVDD applicable to the problem at hand. Another vital contribution is a novel approach enhancing SVDD to work with multivariate time series. The outcome is the classifier “SVDDsubseq” that is directly applicable to test drive data, without the need for expert knowledge to configure or tune the classifier. The second aim is achieved by adapting visual data mining techniques to make the manual analysis of test drives more efficient. The methods of “parallel coordinates” and “scatter plot matrices” are enhanced by sophisticated filter and query operations, combined with a query tool that allows to graphically formulate search patterns. As a combination of the autonomous classifier “SVDDsubseq” and user-driven visual data mining techniques, a novel, data-driven, semi-autonomous approach to detect unmodelled faults in recordings from test drives is proposed and successfully validated on recordings from test drives. The methodologies in this Thesis can be used as a guideline when setting up an anomaly detection system for own vehicle data. 006.31
93	Anomaly Detection in an e-Transaction System using Data Driven Machine Learning Models : An unsupervised learning approach in time-series data Avdic, Adnan, Ekholm, Albin January 2019 (has links) Background: Detecting anomalies in time-series data is a task that can be done with the help of data driven machine learning models. This thesis will investigate if, and how well, different machine learning models, with an unsupervised approach,can detect anomalies in the e-Transaction system Ericsson Wallet Platform. The anomalies in our domain context is delays on the system. Objectives: The objectives of this thesis work is to compare four different machine learning models ,in order to find the most relevant model. The best performing models are decided by the evaluation metric F1-score. An intersection of the best models are also being evaluated in order to decrease the number of False positives in order to make the model more precise. Methods: Investigating a relevant time-series data sample with 10-minutes interval data points from the Ericsson Wallet Platform was used. A number of steps were taken such as, handling data, pre-processing, normalization, training and evaluation.Two relevant features was trained separately as one-dimensional data sets. The two features that are relevant when finding delays in the system which was used in this thesis is the Mean wait (ms) and the feature Mean * N were the N is equal to the Number of calls to the system. The evaluation metrics that was used are True positives, True Negatives, False positives, False Negatives, Accuracy, Precision, Recall, F1-score and Jaccard index. The Jaccard index is a metric which will reveal how similar each algorithm are at their detection. Since the detection are binary, it’s classifying the each data point in the time-series data. Results: The results reveals the two best performing models regards to the F1-score.The intersection evaluation reveals if and how well a combination of the two best performing models can reduce the number of False positives. Conclusions: The conclusion to this work is that some algorithms perform better than others. It is a proof of concept that such classification algorithms can separate normal from non-normal behavior in the domain of the Ericsson Wallet Platform. Anomaly Detection e-Transaction System Machine Learning Unsupervised Learning. Computer Systems Datorsystem
94	Anomaly Detection for Machine Diagnostics : Using Machine learning approach to detect motor faults / Anomalidetektion för maskindiagnostik Meszaros, Christopher, Wärn, Fabian January 2019 (has links) Machine diagnostics is usually done via conditioned monitoring (CM). This approach analyses certain thresholds or patterns for diagnostic purposes. This approach can be costly and time consuming for industries. A larger downside is the difficulty in generalizing CM to a wider set of machines.There is a new trend of using a Machine learning (ML) approach to diagnose machines states. An ML approach would implement an autonomous system for diagnosing machines. It is highly desirable within industry to replace the manual labor performed when setting up CBM systems. Often the ML algorithms chosen are novelty/anomaly based. It is a popular hypothesis that detecting anomalous measurements from a system is a natural byproduct of a machine in a faulty state.The purpose of this thesis is to help CombiQ with an implementation strategy for a fault detection system. The idea of the fault detection system is to make prediction outcomes for machines within the system. More specifically, the prediction will inform whether a machine is in a faulty state or a normal state. An ML approach will be implemented to predict anomalous measurements that corresponds to a faulty state. The system will have no previous data on the machines. However, data for a machine will be acquired once sensors (designed by CombiQ) have been set up for the machine.The results of the thesis proposes an unsupervised and semi-supervised approach for creating the ML models used for the fault detection system. The unsupervised approach will rely on assumptions when selecting the hyperparameters for the ML. The semi-supervised approach will try to learn the hyperparameters through cross validation and grid search. An experiment was set up check whether three ML algorithms can learn optimal hyperparameter values for predicting rotational unbalance. The algorithm known as OneClassVM showed the best precision results and hence proved more useful for CombiQ’s criterium. Machine diagnostics Rotational unbalance Shaft misalignment Anomaly detection Computer Sciences Datavetenskap (datalogi)
95	Algorithmic trading surveillance : Identifying deviating behavior with unsupervised anomaly detection Larsson, Frans January 2019 (has links) The financial markets are no longer what they used to be and one reason for this is the breakthrough of algorithmic trading. Although this has had several positive effects, there have been recorded incidents where algorithms have been involved. It is therefore of interest to find effective methods to monitor algorithmic trading. The purpose of this thesis was therefore to contribute to this research area by investigating if machine learning can be used for detecting deviating behavior. Since the real world data set used in this study lacked labels, an unsupervised anomaly detection approach was chosen. Two models, isolation forest and deep denoising autoencoder, were selected and evaluated. Because the data set lacked labels, artificial anomalies were injected into the data set to make evaluation of the models possible. These synthetic anomalies were generated by two different approaches, one based on a downsampling strategy and one based on manual construction and modification of real data. The evaluation of the anomaly detection models shows that both isolation forest and deep denoising autoencoder outperform a trivial baseline model, and have the ability to detect deviating behavior. Furthermore, it is shown that a deep denoising autoencoder outperforms isolation forest, with respect to both area under the receiver operating characteristics curve and area under the precision-recall curve. A deep denoising autoencoder is therefore recommended for the purpose of algorithmic trading surveillance. machine learning anomaly detection deep learning Probability Theory and Statistics Sannolikhetsteori och statistik
96	Machine Anomaly Detection using Sound Spectrogram Images and Neural Networks Hanjun Kim (6947996) 14 August 2019 (has links) <div> <p>Sound and vibration analysis is a prominent tool used for scientific investigations in various fields such as structural model identification or dynamic behavior studies. In manufacturing fields, the vibration signals collected through commercial sensors are utilized to monitor machine health, for sustainable and cost-effective manufacturing.</p> <p> Recently, the development of commercial sensors and computing environments have encouraged researchers to combine gathered data and Machine Learning (ML) techniques, which have been proven to be efficient for categorical classification problems. These discriminative algorithms have been successfully implemented in monitoring problems in factories, by simulating faulty situations. However, it is difficult to identify all the sources of anomalies in a real environment. </p> <p>In this paper, a Neural Network (NN) application on a KUKA KR6 robot arm is introduced, as a solution for the limitations described above. Specifically, the autoencoder architecture was implemented for anomaly detection, which does not require the predefinition of faulty signals in the training process. In addition, stethoscopes were utilized as alternative sensing tools as they are easy to handle, and they provide a cost-effective monitoring solution. To simulate the normal and abnormal conditions, different load levels were assigned at the end of the robot arm according to the load capacity. Sound signals were recorded from joints of the robot arm, then meaningful features were extracted from spectrograms of the sound signals. The features were utilized to train and test autoencoders. During the autoencoder process, reconstruction errors (REs) between the autoencoder’s input and output were computed. Since autoencoders were trained only with features corresponding to normal conditions, RE values corresponding to abnormal features tend to be higher than those of normal features. In each autoencoder, distributions of the RE values were compared to set a threshold, which distinguishes abnormal states from the normal states. As a result, it is suggested that the threshold of RE values can be utilized to determine the condition of the robot arm.</p> </div> <br> Mechanical Engineering anomaly detection Neural Networks autoencoder sound spectrogram fault detection
97	Détection d’anomalies dans les séries temporelles : application aux masses de données sur les pneumatiques / Outlier detection for time series data : application to tyre data Benkabou, Seif-Eddine 21 March 2018 (has links) La détection d'anomalies est une tâche cruciale qui a suscité l'intérêt de plusieurs travaux de recherche dans les communautés d'apprentissage automatique et fouille de données. La complexité de cette tâche dépend de la nature des données, de la disponibilité de leur étiquetage et du cadre applicatif dont elles s'inscrivent. Dans le cadre de cette thèse, nous nous intéressons à cette problématique pour les données complexes et particulièrement pour les séries temporelles uni et multi-variées. Le terme "anomalie" peut désigner une observation qui s'écarte des autres observations au point d'éveiller des soupçons. De façon plus générale, la problématique sous-jacente (aussi appelée détection de nouveautés ou détection des valeurs aberrantes) vise à identifier, dans un ensemble de données, celles qui différent significativement des autres, qui ne se conforment pas à un "comportement attendu" (à définir ou à apprendre automatiquement), et qui indiquent un processus de génération différent. Les motifs "anormaux" ainsi détectés se traduisent souvent par de l'information critique. Nous nous focalisons plus précisément sur deux aspects particuliers de la détection d'anomalies à partir de séries temporelles dans un mode non-supervisé. Le premier est global et consiste à ressortir des séries relativement anormales par rapport une base entière. Le second est dit contextuel et vise à détecter localement, les points anormaux par rapport à la structure de la série étudiée. Pour ce faire, nous proposons des approches d'optimisation à base de clustering pondéré et de déformation temporelle pour la détection globale ; et des mécanismes à base de modélisation matricielle pour la détection contextuelle. Enfin, nous présentons une série d'études empiriques sur des données publiques pour valider les approches proposées et les comparer avec d'autres approches connues dans la littérature. De plus, une validation expérimentale est fournie sur un problème réel, concernant la détection de séries de prix aberrants sur les pneumatiques, pour répondre aux besoins exprimés par le partenaire industriel de cette thèse / Anomaly detection is a crucial task that has attracted the interest of several research studies in machine learning and data mining communities. The complexity of this task depends on the nature of the data, the availability of their labeling and the application framework on which they depend. As part of this thesis, we address this problem for complex data and particularly for uni and multivariate time series. The term "anomaly" can refer to an observation that deviates from other observations so as to arouse suspicion that it was generated by a different generation process. More generally, the underlying problem (also called novelty detection or outlier detection) aims to identify, in a set of data, those which differ significantly from others, which do not conform to an "expected behavior" (which could be defined or learned), and which indicate a different mechanism. The "abnormal" patterns thus detected often result in critical information. We focus specifically on two particular aspects of anomaly detection from time series in an unsupervised fashion. The first is global and consists in detecting abnormal time series compared to an entire database, whereas the second one is called contextual and aims to detect locally, the abnormal points with respect to the global structure of the relevant time series. To this end, we propose an optimization approaches based on weighted clustering and the warping time for global detection ; and matrix-based modeling for the contextual detection. Finally, we present several empirical studies on public data to validate the proposed approaches and compare them with other known approaches in the literature. In addition, an experimental validation is provided on a real problem, concerning the detection of outlier price time series on the tyre data, to meet the needs expressed by, LIZEO, the industrial partner of this thesis Détection d'anomalies Séries temporelles Classification Anomaly detection Time series Clustering 004
98	Machine learning to detect anomalies in datacenter Lindh, Filip January 2019 (has links) This thesis investigates the possibility of using anomaly detection on performance data of virtual servers in a datacenter to detect malfunctioning servers. Using anomaly detection can potentially reduce the time a server is malfunctioning, as the server can be detected and checked before the error has a significant impact. Several approaches and methods were applied and evaluated on one virtual server: the K-nearest neighbor algorithm, the support-vector machine, the K-means clustering algorithm, self-organizing maps, CPU-memory usage ratio using a Gaussian model, and time series analysis using neural network and linear regression. The evaluation and comparison of the methods were mainly based on reported errors during the time period they were tested. The better the detected anomalies matched the reported errors the higher score they received. It turned out that anomalies in performance data could be linked to real errors in the server to some extent. This enables the possibility of using anomaly detection on performance data as a way to detect malfunctioning servers. The most simple method, looking at the ratio between memory usage and CPU, was the most successful one, detecting most errors. However the anomalies were often detected just after the error had been reported. Support vector machine were more successful at detecting anomalies before they were reported. The proportion of anomalies played a big role however and K-nearest neighbor received higher score when having a higher proportion of anomalies. machine learning anomaly detection server support vector machine performance data Computer Sciences Datavetenskap (datalogi)
99	Detecção de anomalias utilizando métodos paramétricos e múltiplos classificadores / Anomaly detection using parametric methods and multiple classifiers Costa, Gabriel de Barros Paranhos da 25 August 2014 (has links) Anomalias ou outliers são exemplos ou grupo de exemplos que apresentam comportamento diferente do esperado. Na prática,esses exemplos podem representar doenças em um indivíduo ou em uma população, além de outros eventos como fraudes em operações bancárias e falhas em sistemas. Diversas técnicas existentes buscam identificar essas anomalias, incluindo adaptações de métodos de classificação e métodos estatísticos. Os principais desafios são o desbalanceamento do número de exemplos em cada uma das classes e a definição do comportamento normal associada à formalização de um modelo para esse comportamento. Nesta dissertação propõe-se a utilização de um novo espaço para realizar a detecção,esse espaço é chamado espaço de parâmetros. Um espaço de parâmetros é criado utilizando parâmetros estimados a partir da concatenação(encadeamento) de dois exemplos. Apresenta-se,então,um novo framework para realizar a detecção de anomalias através da fusão de detectores que utilizam fechos convexos em múltiplos espaços de parâmetros para realizar a detecção. O método é considerado um framework pois é possível escolher quais os espaços de parâmetros que serão utilizados pelo método de acordo como comportamento da base de dados alvo. Nesse trabalho utilizou-se,para experimentos,dois conjuntos de parâmetros(média e desvio padrão; média, variância, obliquidade e curtose) e os resultados obtidos foram comparados com alguns métodos comumente utilizados para detecção de anomalias. Os resultados atingidos foram comparáveis ou melhores aos obtidos pelos demais métodos. Além disso, acredita-se que a utilização de espaços de parâmetros cria uma grande flexibilidade do método proposto, já que o usuário pode escolher um espaço de parâmetros que se adeque a sua aplicação. Tanto a flexibilidade quanto a extensibilidade disponibilizada pelo espaço de parâmetros, em conjunto como bom desempenho do método proposto nos experimentos realizados, tornam atrativa a utilização de espaços de parâmetros e, mais especificamente, dos métodos apresentados na solução de problemas de detecção de anomalias. / Anomalies or outliers are examples or group of examples that have a behaviour different from the expected. These examples may represent diseases in individuals or populations,as well as other events such as fraud and failures in banking systems.Several existing techniques seek to identify these anomalies, including adaptations of classification methods, statistical methods and methods based on information theory. The main challenges are that the number of samples of each class is unbalanced, the cases when anomalies are disguised among normal samples and the definition of normal behaviour associated with the formalization of a model for this behaviour. In this dissertation,we propose the use of a new space to helpwith the detection task, this space is called parameter space. We also present a new framework to perform anomaly detection by using the fusion of convex hulls in multiple parameter spaces to perform the detection.The method is considered a framework because it is possible to choose which parameter spaces will be used by the method according to the behaviour of the target data set.For the experiments, two parameter spaces were used (mean and standard deviation; mean, variance, skewness and kurtosis) and the results were compared to some commonly used anomaly detection methods. The results achieved were comparable or better than those obtained by the other methods. Furthermore, we believe that a parameter space created great fexibility for the proposed method, since it allowed the user to choose a parameter space that best models the application. Both the flexibility and extensibility provided by the use of parameter spaces, together with the good performance achieved by the proposed method in the experiments, make parameter spaces and, more specifically, the proposed methods appealing when solving anomaly detection problems. Anomaly detection Convex hull Detecção de anomalia Fecho convexo Pattern recognition Reconhecimento de padrões
100	Anomaly detection based on multiple streaming sensor data Menglei, Min January 2019 (has links) Today, the Internet of Things is widely used in various fields, such as factories, public facilities, and even homes. The use of the Internet of Things involves a large number of sensor devices that collect various types of data in real time, such as machine voltage, current, and temperature. These devices will generate a large amount of streaming sensor data. These data can be used to make the data analysis, which can discover hidden relation such as monitoring operating status of a machine, detecting anomalies and alerting the company in time to avoid significant losses. Therefore, the application of anomaly detection in the field of data mining is very extensive. This paper proposes an anomaly detection method based on multiple streaming sensor data and performs anomaly detection on three data sets which are from the real company. First, this project proposes the state transition detection algorithm, state classification algorithm, and the correlation analysis method based on frequency. Then two algorithms were implemented in Python, and then make the correlation analysis using the results from the system to find some possible meaningful relations which can be used in the anomaly detection. Finally, calculate the accuracy and time complexity of the system, and then evaluated its feasibility and scalability. From the evaluation result, it is concluded that the method Anomaly detection streaming sensor data state transition detection state classification correlation analysis Python. Computer Systems Datorsystem

Search results