101 |
Détection d’anomalies dans les séries temporelles : application aux masses de données sur les pneumatiques / Outlier detection for time series data : application to tyre dataBenkabou, Seif-Eddine 21 March 2018 (has links)
La détection d'anomalies est une tâche cruciale qui a suscité l'intérêt de plusieurs travaux de recherche dans les communautés d'apprentissage automatique et fouille de données. La complexité de cette tâche dépend de la nature des données, de la disponibilité de leur étiquetage et du cadre applicatif dont elles s'inscrivent. Dans le cadre de cette thèse, nous nous intéressons à cette problématique pour les données complexes et particulièrement pour les séries temporelles uni et multi-variées. Le terme "anomalie" peut désigner une observation qui s'écarte des autres observations au point d'éveiller des soupçons. De façon plus générale, la problématique sous-jacente (aussi appelée détection de nouveautés ou détection des valeurs aberrantes) vise à identifier, dans un ensemble de données, celles qui différent significativement des autres, qui ne se conforment pas à un "comportement attendu" (à définir ou à apprendre automatiquement), et qui indiquent un processus de génération différent. Les motifs "anormaux" ainsi détectés se traduisent souvent par de l'information critique. Nous nous focalisons plus précisément sur deux aspects particuliers de la détection d'anomalies à partir de séries temporelles dans un mode non-supervisé. Le premier est global et consiste à ressortir des séries relativement anormales par rapport une base entière. Le second est dit contextuel et vise à détecter localement, les points anormaux par rapport à la structure de la série étudiée. Pour ce faire, nous proposons des approches d'optimisation à base de clustering pondéré et de déformation temporelle pour la détection globale ; et des mécanismes à base de modélisation matricielle pour la détection contextuelle. Enfin, nous présentons une série d'études empiriques sur des données publiques pour valider les approches proposées et les comparer avec d'autres approches connues dans la littérature. De plus, une validation expérimentale est fournie sur un problème réel, concernant la détection de séries de prix aberrants sur les pneumatiques, pour répondre aux besoins exprimés par le partenaire industriel de cette thèse / Anomaly detection is a crucial task that has attracted the interest of several research studies in machine learning and data mining communities. The complexity of this task depends on the nature of the data, the availability of their labeling and the application framework on which they depend. As part of this thesis, we address this problem for complex data and particularly for uni and multivariate time series. The term "anomaly" can refer to an observation that deviates from other observations so as to arouse suspicion that it was generated by a different generation process. More generally, the underlying problem (also called novelty detection or outlier detection) aims to identify, in a set of data, those which differ significantly from others, which do not conform to an "expected behavior" (which could be defined or learned), and which indicate a different mechanism. The "abnormal" patterns thus detected often result in critical information. We focus specifically on two particular aspects of anomaly detection from time series in an unsupervised fashion. The first is global and consists in detecting abnormal time series compared to an entire database, whereas the second one is called contextual and aims to detect locally, the abnormal points with respect to the global structure of the relevant time series. To this end, we propose an optimization approaches based on weighted clustering and the warping time for global detection ; and matrix-based modeling for the contextual detection. Finally, we present several empirical studies on public data to validate the proposed approaches and compare them with other known approaches in the literature. In addition, an experimental validation is provided on a real problem, concerning the detection of outlier price time series on the tyre data, to meet the needs expressed by, LIZEO, the industrial partner of this thesis
|
102 |
Machine learning to detect anomalies in datacenterLindh, Filip January 2019 (has links)
This thesis investigates the possibility of using anomaly detection on performance data of virtual servers in a datacenter to detect malfunctioning servers. Using anomaly detection can potentially reduce the time a server is malfunctioning, as the server can be detected and checked before the error has a significant impact. Several approaches and methods were applied and evaluated on one virtual server: the K-nearest neighbor algorithm, the support-vector machine, the K-means clustering algorithm, self-organizing maps, CPU-memory usage ratio using a Gaussian model, and time series analysis using neural network and linear regression. The evaluation and comparison of the methods were mainly based on reported errors during the time period they were tested. The better the detected anomalies matched the reported errors the higher score they received. It turned out that anomalies in performance data could be linked to real errors in the server to some extent. This enables the possibility of using anomaly detection on performance data as a way to detect malfunctioning servers. The most simple method, looking at the ratio between memory usage and CPU, was the most successful one, detecting most errors. However the anomalies were often detected just after the error had been reported. Support vector machine were more successful at detecting anomalies before they were reported. The proportion of anomalies played a big role however and K-nearest neighbor received higher score when having a higher proportion of anomalies.
|
103 |
Detecção de anomalias utilizando métodos paramétricos e múltiplos classificadores / Anomaly detection using parametric methods and multiple classifiersCosta, Gabriel de Barros Paranhos da 25 August 2014 (has links)
Anomalias ou outliers são exemplos ou grupo de exemplos que apresentam comportamento diferente do esperado. Na prática,esses exemplos podem representar doenças em um indivíduo ou em uma população, além de outros eventos como fraudes em operações bancárias e falhas em sistemas. Diversas técnicas existentes buscam identificar essas anomalias, incluindo adaptações de métodos de classificação e métodos estatísticos. Os principais desafios são o desbalanceamento do número de exemplos em cada uma das classes e a definição do comportamento normal associada à formalização de um modelo para esse comportamento. Nesta dissertação propõe-se a utilização de um novo espaço para realizar a detecção,esse espaço é chamado espaço de parâmetros. Um espaço de parâmetros é criado utilizando parâmetros estimados a partir da concatenação(encadeamento) de dois exemplos. Apresenta-se,então,um novo framework para realizar a detecção de anomalias através da fusão de detectores que utilizam fechos convexos em múltiplos espaços de parâmetros para realizar a detecção. O método é considerado um framework pois é possível escolher quais os espaços de parâmetros que serão utilizados pelo método de acordo como comportamento da base de dados alvo. Nesse trabalho utilizou-se,para experimentos,dois conjuntos de parâmetros(média e desvio padrão; média, variância, obliquidade e curtose) e os resultados obtidos foram comparados com alguns métodos comumente utilizados para detecção de anomalias. Os resultados atingidos foram comparáveis ou melhores aos obtidos pelos demais métodos. Além disso, acredita-se que a utilização de espaços de parâmetros cria uma grande flexibilidade do método proposto, já que o usuário pode escolher um espaço de parâmetros que se adeque a sua aplicação. Tanto a flexibilidade quanto a extensibilidade disponibilizada pelo espaço de parâmetros, em conjunto como bom desempenho do método proposto nos experimentos realizados, tornam atrativa a utilização de espaços de parâmetros e, mais especificamente, dos métodos apresentados na solução de problemas de detecção de anomalias. / Anomalies or outliers are examples or group of examples that have a behaviour different from the expected. These examples may represent diseases in individuals or populations,as well as other events such as fraud and failures in banking systems.Several existing techniques seek to identify these anomalies, including adaptations of classification methods, statistical methods and methods based on information theory. The main challenges are that the number of samples of each class is unbalanced, the cases when anomalies are disguised among normal samples and the definition of normal behaviour associated with the formalization of a model for this behaviour. In this dissertation,we propose the use of a new space to helpwith the detection task, this space is called parameter space. We also present a new framework to perform anomaly detection by using the fusion of convex hulls in multiple parameter spaces to perform the detection.The method is considered a framework because it is possible to choose which parameter spaces will be used by the method according to the behaviour of the target data set.For the experiments, two parameter spaces were used (mean and standard deviation; mean, variance, skewness and kurtosis) and the results were compared to some commonly used anomaly detection methods. The results achieved were comparable or better than those obtained by the other methods. Furthermore, we believe that a parameter space created great fexibility for the proposed method, since it allowed the user to choose a parameter space that best models the application. Both the flexibility and extensibility provided by the use of parameter spaces, together with the good performance achieved by the proposed method in the experiments, make parameter spaces and, more specifically, the proposed methods appealing when solving anomaly detection problems.
|
104 |
Anomaly detection based on multiple streaming sensor dataMenglei, Min January 2019 (has links)
Today, the Internet of Things is widely used in various fields, such as factories, public facilities, and even homes. The use of the Internet of Things involves a large number of sensor devices that collect various types of data in real time, such as machine voltage, current, and temperature. These devices will generate a large amount of streaming sensor data. These data can be used to make the data analysis, which can discover hidden relation such as monitoring operating status of a machine, detecting anomalies and alerting the company in time to avoid significant losses. Therefore, the application of anomaly detection in the field of data mining is very extensive. This paper proposes an anomaly detection method based on multiple streaming sensor data and performs anomaly detection on three data sets which are from the real company. First, this project proposes the state transition detection algorithm, state classification algorithm, and the correlation analysis method based on frequency. Then two algorithms were implemented in Python, and then make the correlation analysis using the results from the system to find some possible meaningful relations which can be used in the anomaly detection. Finally, calculate the accuracy and time complexity of the system, and then evaluated its feasibility and scalability. From the evaluation result, it is concluded that the method
|
105 |
Cyber Attacks Detection and Mitigation in SDN EnvironmentsJanuary 2018 (has links)
abstract: Cyber-systems and networks are the target of different types of cyber-threats and attacks, which are becoming more common, sophisticated, and damaging. Those attacks can vary in the way they are performed. However, there are similar strategies
and tactics often used because they are time-proven to be effective. The motivations behind cyber-attacks play an important role in designating how attackers plan and proceed to achieve their goals. Generally, there are three categories of motivation
are: political, economical, and socio-cultural motivations. These indicate that to defend against possible attacks in an enterprise environment, it is necessary to consider what makes such an enterprise environment a target. That said, we can understand
what threats to consider and how to deploy the right defense system. In other words, detecting an attack depends on the defenders having a clear understanding of why they become targets and what possible attacks they should expect. For instance,
attackers may preform Denial of Service (DoS), or even worse Distributed Denial of Service (DDoS), with intention to cause damage to targeted organizations and prevent legitimate users from accessing their services. However, in some cases, attackers are very skilled and try to hide in a system undetected for a long period of time with the incentive to steal and collect data rather than causing damages.
Nowadays, not only the variety of attack types and the way they are launched are important. However, advancement in technology is another factor to consider. Over the last decades, we have experienced various new technologies. Obviously, in the beginning, new technologies will have their own limitations before they stand out. There are a number of related technical areas whose understanding is still less than satisfactory, and in which long-term research is needed. On the other hand, these new technologies can boost the advancement of deploying security solutions and countermeasures when they are carefully adapted. That said, Software Defined Networking i(SDN), its related security threats and solutions, and its adaption in enterprise environments bring us new chances to enhance our security solutions. To reach the optimal level of deploying SDN technology in enterprise environments, it is important to consider re-evaluating current deployed security solutions in traditional networks before deploying them to SDN-based infrastructures. Although DDoS attacks are a bit sinister, there are other types of cyber-threats that are very harmful, sophisticated, and intelligent. Thus, current security defense solutions to detect DDoS cannot detect them. These kinds of attacks are complex, persistent, and stealthy, also referred to Advanced Persistent Threats (APTs) which often leverage the bot control and remotely access valuable information. APT uses multiple stages to break into a network. APT is a sort of unseen, continuous and long-term penetrative network and attackers can bypass the existing security detection systems. It can modify and steal the sensitive data as well as specifically cause physical damage the target system. In this dissertation, two cyber-attack motivations are considered: sabotage, where the motive is the destruction; and information theft, where attackers aim to acquire invaluable information (customer info, business information, etc). I deal with two types of attacks (DDoS attacks and APT attacks) where DDoS attacks are classified under sabotage motivation category, and the APT attacks are classified under information theft motivation category. To detect and mitigate each of these attacks, I utilize the ease of programmability in SDN and its great platform for implementation, dynamic topology changes, decentralized network management, and ease of deploying security countermeasures. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2018
|
106 |
DATA COLLECTION FRAMEWORK AND MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS OF CYBER SECURITY ATTACKSUnknown Date (has links)
The integrity of network communications is constantly being challenged by more sophisticated intrusion techniques. Attackers are shifting to stealthier and more complex forms of attacks in an attempt to bypass known mitigation strategies. Also, many detection methods for popular network attacks have been developed using outdated or non-representative attack data. To effectively develop modern detection methodologies, there exists a need to acquire data that can fully encompass the behaviors of persistent and emerging threats. When collecting modern day network traffic for intrusion detection, substantial amounts of traffic can be collected, much of which consists of relatively few attack instances as compared to normal traffic. This skewed distribution between normal and attack data can lead to high levels of class imbalance. Machine learning techniques can be used to aid in attack detection, but large levels of imbalance between normal (majority) and attack (minority) instances can lead to inaccurate detection results. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
|
107 |
Improving Service Level of Free-Floating Bike Sharing SystemsPal, Aritra 13 November 2017 (has links)
Bike Sharing is a sustainable mode of urban mobility, not only for regular commuters but also for casual users and tourists. Free-floating bike sharing (FFBS) is an innovative bike sharing model, which saves on start-up cost, prevents bike theft, and offers significant opportunities for smart management by tracking bikes in real-time with built-in GPS. Efficient management of a FFBS requires: 1) analyzing its mobility patterns and spatio-temporal imbalance of supply and demand of bikes, 2) developing strategies to mitigate such imbalances, and 3) understanding the causes of a bike getting damaged and developing strategies to minimize them. All of these operational management problems are successfully addressed in this dissertation, using tools from Operations Research, Statistical and Machine Learning and using Share-A-Bull Bike FFBS and Divvy station-based bike sharing system as case studies.
|
108 |
Adaptive Real-time Anomaly Detection for Safeguarding Critical NetworksRing Burbeck, Kalle January 2006 (has links)
<p>Critical networks require defence in depth incorporating many different security technologies including intrusion detection. One important intrusion detection approach is called anomaly detection where normal (good) behaviour of users of the protected system is modelled, often using machine learning or data mining techniques. During detection new data is matched against the normality model, and deviations are marked as anomalies. Since no knowledge of attacks is needed to train the normality model, anomaly detection may detect previously unknown attacks.</p><p>In this thesis we present ADWICE (Anomaly Detection With fast Incremental Clustering) and evaluate it in IP networks. ADWICE has the following properties:</p><p>(i) Adaptation - Rather than making use of extensive periodic retraining sessions on stored off-line data to handle changes, ADWICE is fully incremental making very flexible on-line training of the model possible without destroying what is already learnt. When subsets of the model are not useful anymore, those clusters can be forgotten.</p><p>(ii) Performance - ADWICE is linear in the number of input data thereby heavily reducing training time compared to alternative clustering algorithms. Training time as well as detection time is further reduced by the use of an integrated search-index.</p><p>(iii) Scalability - Rather than keeping all data in memory, only compact cluster summaries are used. The linear time complexity also improves scalability of training.</p><p>We have implemented ADWICE and integrated the algorithm in a software agent. The agent is a part of the Safeguard agent architecture, developed to perform network monitoring, intrusion detection and correlation as well as recovery. We have also applied ADWICE to publicly available network data to compare our approach to related works with similar approaches. The evaluation resulted in a high detection rate at reasonable false positives rate.</p> / Report code: LiU-Tek-Lic-2006:12.
|
109 |
Anomaly detection in unknown environments using wireless sensor networksLi, YuanYuan 01 May 2010 (has links)
This dissertation addresses the problem of distributed anomaly detection in Wireless Sensor Networks (WSN). A challenge of designing such systems is that the sensor nodes are battery powered, often have different capabilities and generally operate in dynamic environments. Programming such sensor nodes at a large scale can be a tedious job if the system is not carefully designed. Data modeling in distributed systems is important for determining the normal operation mode of the system. Being able to model the expected sensor signatures for typical operations greatly simplifies the human designer’s job by enabling the system to autonomously characterize the expected sensor data streams. This, in turn, allows the system to perform autonomous anomaly detection to recognize when unexpected sensor signals are detected. This type of distributed sensor modeling can be used in a wide variety of sensor networks, such as detecting the presence of intruders, detecting sensor failures, and so forth. The advantage of this approach is that the human designer does not have to characterize the anomalous signatures in advance.
The contributions of this approach include: (1) providing a way for a WSN to autonomously model sensor data with no prior knowledge of the environment; (2) enabling a distributed system to detect anomalies in both sensor signals and temporal events online; (3) providing a way to automatically extract semantic labels from temporal sequences; (4) providing a way for WSNs to save communication power by transmitting compressed temporal sequences; (5) enabling the system to detect time-related anomalies without prior knowledge of abnormal events; and, (6) providing a novel missing data estimation method that utilizes temporal and spatial information to replace missing values. The algorithms have been designed, developed, evaluated, and validated experimentally in synthesized data, and in real-world sensor network applications.
|
110 |
Traffic Analysis, Modeling and Their Applications in Energy-Constrained Wireless Sensor Networks : On Network Optimization and Anomaly DetectionWang, Qinghua January 2010 (has links)
Wireless sensor network (WSN) has emerged as a promising technology thanks to the recent advances in electronics, networking, and information processing. A wide range of WSN applications have been proposed such as habitat monitoring, environmental observations and forecasting systems, health monitoring, etc. In these applications, many low power and inexpensive sensor nodes are deployed in a vast space to cooperate as a network. Although WSN is a promising technology, there is still a great deal of additional research required before it finally becomes a mature technology. This dissertation concentrates on three factors which are holding back the development of WSNs. Firstly, there is a lack of traffic analysis & modeling for WSNs. Secondly, network optimization for WSNs needs more investigation. Thirdly, the development of anomaly detection techniques for WSNs remains a seldomly touched area. In the field of traffic analysis & modeling for WSNs, this dissertation presents several ways of modeling different aspects relating to WSN traffic, including the modeling of sequence relations among arriving packets, the modeling of a data traffic arrival process for an event-driven WSN, and the modeling of a traffic load distribution for a symmetric dense WSN. These research results enrich the current understanding regarding the traffic dynamics within WSNs, and provide a basis for further work on network optimization and anomaly detection for WSNs. In the field of network optimization for WSNs, this dissertation presents network optimization models from which network performance bounds can be derived. This dissertation also investigates network performances constrained by the energy resources available in an indentified bottleneck zone. For a symmetric dense WSN, an optimal energy allocation scheme is proposed to minimize the energy waste due to the uneven energy drain among sensor nodes. By modeling the interrelationships among communication traffic, energy consumption and WSN performances, these presented results have efficiently integrated the knowledge on WSN traffic dynamics into the field of network optimization for WSNs. Finally, in the field of anomaly detection for WSNs, this dissertation uses two examples to demonstrate the feasibility and the ease of detecting sensor network anomalies through the analysis of network traffic. The presented results will serve as an inspiration for the research community to develop more secure and more fault-tolerant WSNs. / STC
|
Page generated in 0.414 seconds